[jira] [Updated] (SPARK-4768) Add Support For Impala Encoded Timestamp (INT96)

2015-02-03 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-4768:

Fix Version/s: 1.3.0

 Add Support For Impala Encoded Timestamp (INT96)
 

 Key: SPARK-4768
 URL: https://issues.apache.org/jira/browse/SPARK-4768
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Pat McDonough
Assignee: Yin Huai
Priority: Blocker
 Fix For: 1.3.0

 Attachments: 5e4481a02f951e29-651ee94ed14560bf_922627129_data.0.parq, 
 string_timestamp.gz


 Impala is using INT96 for timestamps. Spark SQL should be able to read this 
 data despite the fact that it is not part of the spec.
 Perhaps adding a flag to act like impala when reading parquet (like we do for 
 strings already) would be useful.
 Here's an example of the error you might see:
 {code}
 Caused by: java.lang.RuntimeException: Potential loss of precision: cannot 
 convert INT96
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:61)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:113)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:314)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:311)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToAttributes(ParquetTypes.scala:310)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:441)
 at 
 org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:66)
 at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:141)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4768) Add Support For Impala Encoded Timestamp (INT96)

2015-02-02 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4768:

Priority: Blocker  (was: Critical)

 Add Support For Impala Encoded Timestamp (INT96)
 

 Key: SPARK-4768
 URL: https://issues.apache.org/jira/browse/SPARK-4768
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Pat McDonough
Priority: Blocker
 Attachments: 5e4481a02f951e29-651ee94ed14560bf_922627129_data.0.parq, 
 string_timestamp.gz


 Impala is using INT96 for timestamps. Spark SQL should be able to read this 
 data despite the fact that it is not part of the spec.
 Perhaps adding a flag to act like impala when reading parquet (like we do for 
 strings already) would be useful.
 Here's an example of the error you might see:
 {code}
 Caused by: java.lang.RuntimeException: Potential loss of precision: cannot 
 convert INT96
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:61)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:113)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:314)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:311)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToAttributes(ParquetTypes.scala:310)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:441)
 at 
 org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:66)
 at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:141)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4768) Add Support For Impala Encoded Timestamp (INT96)

2015-02-02 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4768:

Assignee: Yin Huai

 Add Support For Impala Encoded Timestamp (INT96)
 

 Key: SPARK-4768
 URL: https://issues.apache.org/jira/browse/SPARK-4768
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Pat McDonough
Assignee: Yin Huai
Priority: Blocker
 Attachments: 5e4481a02f951e29-651ee94ed14560bf_922627129_data.0.parq, 
 string_timestamp.gz


 Impala is using INT96 for timestamps. Spark SQL should be able to read this 
 data despite the fact that it is not part of the spec.
 Perhaps adding a flag to act like impala when reading parquet (like we do for 
 strings already) would be useful.
 Here's an example of the error you might see:
 {code}
 Caused by: java.lang.RuntimeException: Potential loss of precision: cannot 
 convert INT96
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:61)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:113)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:314)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:311)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToAttributes(ParquetTypes.scala:310)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:441)
 at 
 org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:66)
 at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:141)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4768) Add Support For Impala Encoded Timestamp (INT96)

2015-01-07 Thread Taiji Okada (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taiji Okada updated SPARK-4768:
---
Attachment: 5e4481a02f951e29-651ee94ed14560bf_922627129_data.0.parq

Attached parquet file, created using the following:

create table string_timestamp
(
  dummy string,
  timestamp1timestamp
) stored as parquet;

insert into string_timestamp (dummy,timestamp1) values('test row 1', 
'2015-01-02 20:54:05');
insert into string_timestamp (dummy,timestamp1) values('test row 2', 
'1900-01-01');
insert into string_timestamp (dummy,timestamp1) values('test row 3', 
'-12-31');
insert into string_timestamp (dummy,timestamp1) values('test row 4', null);
select * from  string_timestamp;

 Add Support For Impala Encoded Timestamp (INT96)
 

 Key: SPARK-4768
 URL: https://issues.apache.org/jira/browse/SPARK-4768
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Pat McDonough
Priority: Critical
 Attachments: 5e4481a02f951e29-651ee94ed14560bf_922627129_data.0.parq


 Impala is using INT96 for timestamps. Spark SQL should be able to read this 
 data despite the fact that it is not part of the spec.
 Perhaps adding a flag to act like impala when reading parquet (like we do for 
 strings already) would be useful.
 Here's an example of the error you might see:
 {code}
 Caused by: java.lang.RuntimeException: Potential loss of precision: cannot 
 convert INT96
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:61)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:113)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:314)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:311)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToAttributes(ParquetTypes.scala:310)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:441)
 at 
 org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:66)
 at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:141)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4768) Add Support For Impala Encoded Timestamp (INT96)

2014-12-19 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4768:

Priority: Critical  (was: Major)

 Add Support For Impala Encoded Timestamp (INT96)
 

 Key: SPARK-4768
 URL: https://issues.apache.org/jira/browse/SPARK-4768
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Pat McDonough
Priority: Critical

 Impala is using INT96 for timestamps. Spark SQL should be able to read this 
 data despite the fact that it is not part of the spec.
 Perhaps adding a flag to act like impala when reading parquet (like we do for 
 strings already) would be useful.
 Here's an example of the error you might see:
 {code}
 Caused by: java.lang.RuntimeException: Potential loss of precision: cannot 
 convert INT96
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:61)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:113)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:314)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:311)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToAttributes(ParquetTypes.scala:310)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:441)
 at 
 org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:66)
 at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:141)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4768) Add Support For Impala Encoded Timestamp (INT96)

2014-12-05 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4768:

Target Version/s: 1.3.0

 Add Support For Impala Encoded Timestamp (INT96)
 

 Key: SPARK-4768
 URL: https://issues.apache.org/jira/browse/SPARK-4768
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Pat McDonough

 Impala is using INT96 for timestamps. Spark SQL should be able to read this 
 data despite the fact that it is not part of the spec.
 Perhaps adding a flag to act like impala when reading parquet (like we do for 
 strings already) would be useful.
 Here's an example of the error you might see:
 {code}
 Caused by: java.lang.RuntimeException: Potential loss of precision: cannot 
 convert INT96
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:61)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:113)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:314)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:311)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToAttributes(ParquetTypes.scala:310)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:441)
 at 
 org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:66)
 at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:141)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org