Re: SQLContext.applySchema strictness

2015-02-15 Thread Michael Armbrust
Applying schema is a pretty low-level operation, and I would expect most
users would use the type safe interfaces.  If you are unsure you can always
run:

import org.apache.spark.sql.execution.debug._
schemaRDD.typeCheck()

and it will tell you if you have made any mistakes.

Michael

On Sat, Feb 14, 2015 at 1:05 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 Would it make sense to add an optional validate parameter to applySchema()
 which defaults to False, both to give users the option to check the schema
 immediately and to make the default behavior clearer?
 ​

 On Sat Feb 14 2015 at 9:18:59 AM Michael Armbrust mich...@databricks.com
 wrote:

 Doing runtime type checking is very expensive, so we only do it when
 necessary (i.e. you perform an operation like adding two columns together)

 On Sat, Feb 14, 2015 at 2:19 AM, nitin nitin2go...@gmail.com wrote:

 AFAIK, this is the expected behavior. You have to make sure that the
 schema
 matches the row. It won't give any error when you apply the schema as it
 doesn't validate the nature of data.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650p21653.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: SQLContext.applySchema strictness

2015-02-14 Thread nitin
AFAIK, this is the expected behavior. You have to make sure that the schema
matches the row. It won't give any error when you apply the schema as it
doesn't validate the nature of data.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650p21653.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SQLContext.applySchema strictness

2015-02-14 Thread Michael Armbrust
Doing runtime type checking is very expensive, so we only do it when
necessary (i.e. you perform an operation like adding two columns together)

On Sat, Feb 14, 2015 at 2:19 AM, nitin nitin2go...@gmail.com wrote:

 AFAIK, this is the expected behavior. You have to make sure that the schema
 matches the row. It won't give any error when you apply the schema as it
 doesn't validate the nature of data.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650p21653.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: SQLContext.applySchema strictness

2015-02-14 Thread Nicholas Chammas
Would it make sense to add an optional validate parameter to applySchema()
which defaults to False, both to give users the option to check the schema
immediately and to make the default behavior clearer?
​

On Sat Feb 14 2015 at 9:18:59 AM Michael Armbrust mich...@databricks.com
wrote:

 Doing runtime type checking is very expensive, so we only do it when
 necessary (i.e. you perform an operation like adding two columns together)

 On Sat, Feb 14, 2015 at 2:19 AM, nitin nitin2go...@gmail.com wrote:

 AFAIK, this is the expected behavior. You have to make sure that the
 schema
 matches the row. It won't give any error when you apply the schema as it
 doesn't validate the nature of data.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650p21653.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





SQLContext.applySchema strictness

2015-02-13 Thread Justin Pihony
Per the documentation:

  It is important to make sure that the structure of every Row of the
provided RDD matches the provided schema. Otherwise, there will be runtime
exception.

However, it appears that this is not being enforced. 

import org.apache.spark.sql._
val sqlContext = new SqlContext(sc)
val struct = StructType(List(StructField(test, BooleanType, true)))
val myData = sc.parallelize(List(Row(0), Row(true), Row(stuff)))
val schemaData = sqlContext.applySchema(myData, struct) //No error
schemaData.collect()(0).getBoolean(0) //Only now will I receive an error

Is this expected or a bug?

Thanks,
Justin



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SQLContext.applySchema strictness

2015-02-13 Thread Yin Huai
Hi Justin,

It is expected. We do not check if the provided schema matches rows since
all rows need to be scanned to give a correct answer.

Thanks,

Yin

On Fri, Feb 13, 2015 at 1:33 PM, Justin Pihony justin.pih...@gmail.com
wrote:

 Per the documentation:

   It is important to make sure that the structure of every Row of the
 provided RDD matches the provided schema. Otherwise, there will be runtime
 exception.

 However, it appears that this is not being enforced.

 import org.apache.spark.sql._
 val sqlContext = new SqlContext(sc)
 val struct = StructType(List(StructField(test, BooleanType, true)))
 val myData = sc.parallelize(List(Row(0), Row(true), Row(stuff)))
 val schemaData = sqlContext.applySchema(myData, struct) //No error
 schemaData.collect()(0).getBoolean(0) //Only now will I receive an error

 Is this expected or a bug?

 Thanks,
 Justin



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: SQLContext.applySchema strictness

2015-02-13 Thread Justin Pihony
OK, but what about on an action, like collect()? Shouldn't it be able to
determine the correctness at that time?

On Fri, Feb 13, 2015 at 4:49 PM, Yin Huai yh...@databricks.com wrote:

 Hi Justin,

 It is expected. We do not check if the provided schema matches rows since
 all rows need to be scanned to give a correct answer.

 Thanks,

 Yin

 On Fri, Feb 13, 2015 at 1:33 PM, Justin Pihony justin.pih...@gmail.com
 wrote:

 Per the documentation:

   It is important to make sure that the structure of every Row of the
 provided RDD matches the provided schema. Otherwise, there will be runtime
 exception.

 However, it appears that this is not being enforced.

 import org.apache.spark.sql._
 val sqlContext = new SqlContext(sc)
 val struct = StructType(List(StructField(test, BooleanType, true)))
 val myData = sc.parallelize(List(Row(0), Row(true), Row(stuff)))
 val schemaData = sqlContext.applySchema(myData, struct) //No error
 schemaData.collect()(0).getBoolean(0) //Only now will I receive an error

 Is this expected or a bug?

 Thanks,
 Justin



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-applySchema-strictness-tp21650.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org