[jira] [Comment Edited] (SPARK-8824) Support Parquet time related logical types

2016-09-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15525313#comment-15525313 ] Cheng Lian edited comment on SPARK-8824 at 9/27/16 7:09 AM: Since we've

[jira] [Commented] (SPARK-8824) Support Parquet time related logical types

2016-09-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15525313#comment-15525313 ] Cheng Lian commented on SPARK-8824: --- Since we've already upgraded parquet-mr in Spark master to 1.8.1

[jira] [Commented] (SPARK-17572) Write.df is failing on spark cluster

2016-09-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506051#comment-15506051 ] Cheng Lian commented on SPARK-17572: Yea, I know you are not using HDFS. But Spark always uses Hadoop

[jira] [Commented] (SPARK-17572) Write.df is failing on spark cluster

2016-09-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505921#comment-15505921 ] Cheng Lian commented on SPARK-17572: Which version of Hadoop are you using? Does it work when you

[jira] [Updated] (SPARK-17572) Write.df is failing on spark cluster

2016-09-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17572: --- Description: Hi, We have spark cluster with four nodes, all four nodes have NFS partition shared

[jira] [Resolved] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978

2016-08-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-17289. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14865 [https

[jira] [Updated] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978

2016-08-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17289: --- Assignee: Takeshi Yamamuro > Sort based partial aggregation breaks due to SPARK-12

[jira] [Updated] (SPARK-16283) Implement percentile_approx SQL function

2016-08-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16283: --- Assignee: (was: Sean Zhong) > Implement percentile_approx SQL funct

[jira] [Updated] (SPARK-16283) Implement percentile_approx SQL function

2016-08-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16283: --- Assignee: Sean Zhong > Implement percentile_approx SQL funct

[jira] [Created] (SPARK-17182) CollectList and CollectSet should be marked as non-deterministic

2016-08-22 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-17182: -- Summary: CollectList and CollectSet should be marked as non-deterministic Key: SPARK-17182 URL: https://issues.apache.org/jira/browse/SPARK-17182 Project: Spark

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

2016-08-12 Thread Cheng Lian
OK, I've merged this PR to master and branch-2.0. On 8/11/16 8:27 AM, Cheng Lian wrote: Haven't figured out the exactly way how it failed, but the leading underscore in the partition directory name looks suspicious. Could you please try this PR to see whether it fixes the issue: https

[jira] [Resolved] (SPARK-16975) Spark-2.0.0 unable to infer schema for parquet data written by Spark-1.6.2

2016-08-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16975. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Updated] (SPARK-16975) Spark-2.0.0 unable to infer schema for parquet data written by Spark-1.6.2

2016-08-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16975: --- Assignee: Dongjoon Hyun > Spark-2.0.0 unable to infer schema for parquet data written by Spark-1.

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

2016-08-10 Thread Cheng Lian
Haven't figured out the exactly way how it failed, but the leading underscore in the partition directory name looks suspicious. Could you please try this PR to see whether it fixes the issue: https://github.com/apache/spark/pull/14585/files Cheng On 8/9/16 5:38 PM, immerrr again wrote:

[jira] [Resolved] (SPARK-16867) createTable and alterTable in ExternalCatalog should not take db

2016-08-04 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16867. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14476 [https

[jira] [Commented] (SPARK-16842) Concern about disallowing user-given schema for Parquet and ORC

2016-08-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403567#comment-15403567 ] Cheng Lian commented on SPARK-16842: First of all, the cost of schema discovery can be heavy when

Re: Review Request 50502: HIVE-14294: HiveSchemaConverter for Parquet doesn't translate TINYINT and SMALLINT into proper Parquet types

2016-07-27 Thread Cheng Lian
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50502/#review143754 --- Ship it! Ship It! - Cheng Lian On July 27, 2016, 1:05 p.m

[jira] [Updated] (SPARK-16621) Generate stable SQLs in SQLBuilder

2016-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16621: --- Assignee: Dongjoon Hyun > Generate stable SQLs in SQLBuil

[jira] [Resolved] (SPARK-16621) Generate stable SQLs in SQLBuilder

2016-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16621. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14257 [https

[jira] [Updated] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-1: --- Description: I'm trying to create a dataset with some geo data using spark and esri. If `Foo` only

[jira] [Updated] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-1: --- Description: I'm trying to create a dataset with some geo data using spark and esri. If `Foo` only

[jira] [Updated] (SPARK-16734) Make sure examples in all language bindings are consistent

2016-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16734: --- Priority: Minor (was: Major) > Make sure examples in all language bindings are consist

[jira] [Resolved] (SPARK-16663) desc table should be consistent between data source and hive serde tables

2016-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16663. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14302 [https

[jira] [Created] (SPARK-16734) Make sure examples in all language bindings are consistent

2016-07-26 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16734: -- Summary: Make sure examples in all language bindings are consistent Key: SPARK-16734 URL: https://issues.apache.org/jira/browse/SPARK-16734 Project: Spark Issue

[jira] [Resolved] (SPARK-16706) support java map in encoder

2016-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16706. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14344 [https

[jira] [Updated] (SPARK-16698) json parsing regression - "." in keys

2016-07-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16698: --- Assignee: Hyukjin Kwon > json parsing regression - ".&

[jira] [Resolved] (SPARK-16698) json parsing regression - "." in keys

2016-07-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16698. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Updated] (SPARK-16668) Test parquet reader for row groups containing both dictionary and plain encoded pages

2016-07-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16668: --- Assignee: Sameer Agarwal > Test parquet reader for row groups containing both dictionary and pl

[jira] [Resolved] (SPARK-16668) Test parquet reader for row groups containing both dictionary and plain encoded pages

2016-07-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16668. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14304 [https

[jira] [Resolved] (SPARK-16691) move BucketSpec to catalyst module and use it in CatalogTable

2016-07-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16691. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14331 [https

[jira] [Resolved] (SPARK-16660) CreateViewCommand should not take CatalogTable

2016-07-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16660. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14297 [https

[jira] [Updated] (SPARK-16703) Extra space in WindowSpecDefinition SQL representation

2016-07-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16703: --- Description: For a {{WindowSpecDefinition}} whose {{partitionSpec}} is empty, there's an extra

[jira] [Updated] (SPARK-16703) Extra space in WindowSpecDefinition SQL representation

2016-07-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16703: --- Description: For a {{WindowSpecDefinition}} whose {{partitionSpec}} is empty, there's an extra

[jira] [Created] (SPARK-16703) Extra space in WindowSpecDefinition SQL representation

2016-07-24 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16703: -- Summary: Extra space in WindowSpecDefinition SQL representation Key: SPARK-16703 URL: https://issues.apache.org/jira/browse/SPARK-16703 Project: Spark Issue

[jira] [Commented] (SPARK-16646) LEAST doesn't accept numeric arguments with different data types

2016-07-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389141#comment-15389141 ] Cheng Lian commented on SPARK-16646: Could you please help check Hive's behavior here? Especially

[jira] [Commented] (SPARK-16632) Vectorized parquet reader fails to read certain fields from Hive tables

2016-07-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387791#comment-15387791 ] Cheng Lian commented on SPARK-16632: Oh, I see, thanks for the explanation. > Vectorized parq

[jira] [Commented] (SPARK-16646) LEAST doesn't accept numeric arguments with different data types

2016-07-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387172#comment-15387172 ] Cheng Lian commented on SPARK-16646: Thanks for the help! I'm not working on this. > LEAST does

[jira] [Updated] (SPARK-16646) LEAST doesn't accept numeric arguments with different data types

2016-07-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16646: --- Reporter: Cheng Lian (was: liancheng) > LEAST doesn't accept numeric arguments with different d

[jira] [Updated] (SPARK-16646) LEAST doesn't accept numeric arguments with different data types

2016-07-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16646: --- Assignee: Hyukjin Kwon > LEAST doesn't accept numeric arguments with different data ty

[jira] [Updated] (SPARK-16648) LAST_VALUE(FALSE) OVER () throws IndexOutOfBoundsException

2016-07-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16648: --- Reporter: Cheng Lian (was: liancheng) > LAST_VALUE(FALSE) OVER () throws IndexOutOfBoundsExcept

[jira] [Commented] (HIVE-14294) HiveSchemaConverter for Parquet doesn't translate TINYINT and SMALLINT into proper Parquet types

2016-07-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-14294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385452#comment-15385452 ] Cheng Lian commented on HIVE-14294: --- Hit this issue while investigating SPARK-16632

[jira] [Commented] (SPARK-16632) Vectorized parquet reader fails to read certain fields from Hive tables

2016-07-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385451#comment-15385451 ] Cheng Lian commented on SPARK-16632: Discussed with [~yhuai] after merging [PR #14272|https

[jira] [Updated] (SPARK-16632) Vectorized parquet reader fails to read certain fields from Hive tables

2016-07-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16632: --- Assignee: Marcelo Vanzin > Vectorized parquet reader fails to read certain fields from Hive tab

[jira] [Commented] (SPARK-16632) Vectorized parquet reader fails to read certain fields from Hive tables

2016-07-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385399#comment-15385399 ] Cheng Lian commented on SPARK-16632: [~vanzin] Did you post the wrong stack trace? This issue

[jira] [Resolved] (SPARK-16632) Vectorized parquet reader fails to read certain fields from Hive tables

2016-07-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16632. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14272 [https

[jira] [Created] (HIVE-14294) HiveSchemaConverter for Parquet doesn't translate TINYINT and SMALLINT into proper Parquet types

2016-07-19 Thread Cheng Lian (JIRA)
Cheng Lian created HIVE-14294: - Summary: HiveSchemaConverter for Parquet doesn't translate TINYINT and SMALLINT into proper Parquet types Key: HIVE-14294 URL: https://issues.apache.org/jira/browse/HIVE-14294

[jira] [Updated] (SPARK-16633) lag/lead does not return the default value when the offset row does not exist

2016-07-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16633: --- Attachment: window_function_bug.html JIRA went down right before [~yhuai] tried to upload

[jira] [Commented] (SPARK-16576) Move plan SQL generation code from SQLBuilder into logical operators

2016-07-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381969#comment-15381969 ] Cheng Lian commented on SPARK-16576: [~rxin] I wrote the first version of the {{SQLBuilder

[jira] [Resolved] (SPARK-16529) SQLTestUtils.withTempDatabase should set `default` database before dropping

2016-07-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16529. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 14184 [https

[jira] [Updated] (SPARK-16529) SQLTestUtils.withTempDatabase should set `default` database before dropping

2016-07-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16529: --- Assignee: Dongjoon Hyun > SQLTestUtils.withTempDatabase should set `default` database bef

[jira] [Resolved] (SPARK-16448) RemoveAliasOnlyProject should not remove alias with metadata

2016-07-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16448. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14106 [https

[jira] [Updated] (SPARK-16343) Improve the PushDownPredicate rule to pushdown predicates currectly in non-deterministic condition

2016-07-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16343: --- Assignee: Jiang Xingbo > Improve the PushDownPredicate rule to pushdown predicates currec

[jira] [Updated] (SPARK-16343) Improve the PushDownPredicate rule to pushdown predicates currectly in non-deterministic condition

2016-07-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16343: --- Affects Version/s: 2.0.0 > Improve the PushDownPredicate rule to pushdown predicates currec

[jira] [Resolved] (SPARK-16343) Improve the PushDownPredicate rule to pushdown predicates currectly in non-deterministic condition

2016-07-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16343. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14012 [https

[jira] [Resolved] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

2016-07-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16303. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 14119 [https

[jira] [Resolved] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16381. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 14082 [https

[jira] [Comment Edited] (SPARK-16344) Array of struct with a single field name "element" can't be decoded from Parquet files written by Spark 1.6+

2016-07-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369481#comment-15369481 ] Cheng Lian edited comment on SPARK-16344 at 7/10/16 8:07 AM: - Thanks

[jira] [Commented] (SPARK-16344) Array of struct with a single field name "element" can't be decoded from Parquet files written by Spark 1.6+

2016-07-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369481#comment-15369481 ] Cheng Lian commented on SPARK-16344: Thanks to [~rdblue]'s comment about why there're two different

[jira] [Commented] (SPARK-16344) Array of struct with a single field name "element" can't be decoded from Parquet files written by Spark 1.6+

2016-07-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369032#comment-15369032 ] Cheng Lian commented on SPARK-16344: I was re-thinking about [~rdblue]'s comment above, and tried

Re: 回复: Bug about reading parquet files

2016-07-09 Thread Cheng Lian
According to our offline discussion, the target table consists of 1M+ small Parquet files (~12M by average). The OOM occurred at driver side while listing input files. My theory is that the total size of all listed FileStatus objects is too large for the driver and caused the OOM.

[jira] [Updated] (PARQUET-655) The LogicalTypes.md link in README.md points to the old Parquet GitHub repository

2016-07-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-655: --- Component/s: parquet-format > The LogicalTypes.md link in README.md points to the old Parquet Git

[jira] [Created] (PARQUET-655) The LogicalTypes.md link in README.md points to the old Parquet GitHub repository

2016-07-08 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-655: -- Summary: The LogicalTypes.md link in README.md points to the old Parquet GitHub repository Key: PARQUET-655 URL: https://issues.apache.org/jira/browse/PARQUET-655

Re: Bug about reading parquet files

2016-07-08 Thread Cheng Lian
What's the Spark version? Could you please also attach result of explain(extended = true)? On Fri, Jul 8, 2016 at 4:33 PM, Sea <261810...@qq.com> wrote: > I have a problem reading parquet files. > sql: > select count(1) from omega.dwd_native where year='2016' and month='07' > and day='05' and

[jira] [Comment Edited] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

2016-07-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367345#comment-15367345 ] Cheng Lian edited comment on SPARK-16303 at 7/8/16 7:28 AM: Thanks

[jira] [Comment Edited] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

2016-07-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367345#comment-15367345 ] Cheng Lian edited comment on SPARK-16303 at 7/8/16 7:27 AM: Thanks

[jira] [Commented] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

2016-07-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367345#comment-15367345 ] Cheng Lian commented on SPARK-16303: Thanks for working on this! I'd suggest to send out the PR first

Re: parquet-mr filter pushdown

2016-07-08 Thread Cheng Lian
, this is somewhat related to the vectorized read API we're putting together a hackathon to tackle, so you may want to monitor that effort. rb On Thu, Jul 7, 2016 at 7:47 AM, Cheng Lian <l...@databricks.com <mailto:l...@databricks.com>> wrote: One of the commonly seen E

[jira] [Created] (PARQUET-654) Make record-level filtering optional

2016-07-08 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-654: -- Summary: Make record-level filtering optional Key: PARQUET-654 URL: https://issues.apache.org/jira/browse/PARQUET-654 Project: Parquet Issue Type: Improvement

[jira] [Comment Edited] (SPARK-16344) Array of struct with a single field name "element" can't be decoded from Parquet files written by Spark 1.6+

2016-07-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366069#comment-15366069 ] Cheng Lian edited comment on SPARK-16344 at 7/8/16 12:12 AM: - Thanks

Re: parquet-mr filter pushdown

2016-07-07 Thread Cheng Lian
One of the commonly seen ETL use cases of Spark is inferring schema automatically from JSON datasets and then convert them into Parquet. In similar use cases, schema evolution support can be crucial. Reading from Parquet files with different but compatible schemata is quite common. Schema

[jira] [Commented] (SPARK-16344) Array of struct with a single field name "element" can't be decoded from Parquet files written by Spark 1.6+

2016-07-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366069#comment-15366069 ] Cheng Lian commented on SPARK-16344: Thanks for the detailed response! Spark SQL also has two

[jira] [Resolved] (SPARK-16400) Remove InSet filter pushdown from Parquet

2016-07-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16400. Resolution: Fixed Assignee: Reynold Xin Fix Version/s: 2.1.0 Resolved by https

[jira] [Commented] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365777#comment-15365777 ] Cheng Lian commented on SPARK-16381: For a specific release, usually, we only make a schedule

[jira] [Commented] (SPARK-16380) Update SQL examples and programming guide for Python language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365591#comment-15365591 ] Cheng Lian commented on SPARK-16380: [~wm624] Considering 2.0.0 RC2 has already been cut, it's

[jira] [Commented] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365590#comment-15365590 ] Cheng Lian commented on SPARK-16303: [~aokolnychyi] Considering 2.0.0 RC2 has already been cut, it's

[jira] [Commented] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365583#comment-15365583 ] Cheng Lian commented on SPARK-16381: Thanks for volunteering! I've assigned this ticket to you

[jira] [Updated] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16381: --- Assignee: Xin Ren > Update SQL examples and programming guide for R language bind

[jira] [Commented] (SPARK-16380) Update SQL examples and programming guide for Python language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365579#comment-15365579 ] Cheng Lian commented on SPARK-16380: I just noticed that I put "Scala" into the JIRA ti

[jira] [Updated] (SPARK-16380) Update SQL examples and programming guide for Python language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16380: --- Summary: Update SQL examples and programming guide for Python language binding (was: Update SQL

[jira] [Resolved] (SPARK-16388) Remove spark.sql.nativeView and spark.sql.nativeView.canonical config

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16388. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14061 [https

[jira] [Updated] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16381: --- Labels: (was: Starter) > Update SQL examples and programming guide for R language bind

[jira] [Updated] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16381: --- Labels: Starter (was: ) > Update SQL examples and programming guide for R language bind

[jira] [Updated] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16381: --- Description: Please follow guidelines listed in this SPARK-16303 [comment|https://issues.apache.org

[jira] [Commented] (SPARK-16380) Update SQL examples and programming guide for Scala Python language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364057#comment-15364057 ] Cheng Lian commented on SPARK-16380: I've assigned this ticket to you. > Update SQL examp

[jira] [Updated] (SPARK-16380) Update SQL examples and programming guide for Scala Python language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16380: --- Assignee: Miao Wang > Update SQL examples and programming guide for Scala Python language bind

[jira] [Commented] (SPARK-16380) Update SQL examples and programming guide for Scala Python language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364055#comment-15364055 ] Cheng Lian commented on SPARK-16380: Thanks for volunteering! Following guidelines listed in SPARK

[jira] [Updated] (SPARK-16380) Update SQL examples and programming guide for Scala Python language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16380: --- Description: Please follow guidelines listed in this SPARK-16303 [comment|https://issues.apache.org

[jira] [Updated] (SPARK-16380) Update SQL examples and programming guide for Scala Python language binding

2016-07-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16380: --- Description: Please follow guidelines listed > Update SQL examples and programming guide for Sc

[jira] [Updated] (PARQUET-651) Parquet-avro fails to decode array of record with a single field name "element" correctly

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-651: --- Affects Version/s: 1.9.0 > Parquet-avro fails to decode array of record with a single field n

[jira] [Resolved] (SPARK-16330) Null pointer getting count from avro file in mesos distributed

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16330. Resolution: Invalid I'm resolving this issue as invalid since it's actually a spark-avro bug

[jira] [Commented] (SPARK-16330) Null pointer getting count from avro file in mesos distributed

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362678#comment-15362678 ] Cheng Lian commented on SPARK-16330: Please find the root cause analysis in the comment area: https

[jira] [Assigned] (SPARK-16330) Null pointer getting count from avro file in mesos distributed

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-16330: -- Assignee: Cheng Lian > Null pointer getting count from avro file in mesos distribu

[jira] [Updated] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16303: --- Assignee: Anton Okolnychyi (was: Cheng Lian) > Update SQL examples and programming guide for Sc

[jira] [Commented] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362625#comment-15362625 ] Cheng Lian commented on SPARK-16303: Just assigned this ticket to you. > Update SQL examp

[jira] [Created] (SPARK-16380) Update SQL examples and programming guide for Scala Python language binding

2016-07-05 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16380: -- Summary: Update SQL examples and programming guide for Scala Python language binding Key: SPARK-16380 URL: https://issues.apache.org/jira/browse/SPARK-16380 Project

[jira] [Created] (SPARK-16381) Update SQL examples and programming guide for R language binding

2016-07-05 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16381: -- Summary: Update SQL examples and programming guide for R language binding Key: SPARK-16381 URL: https://issues.apache.org/jira/browse/SPARK-16381 Project: Spark

[jira] [Updated] (SPARK-16303) Update SQL examples and programming guide

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16303: --- Summary: Update SQL examples and programming guide (was: Update SQL examples and programming guide

[jira] [Commented] (SPARK-16303) Update SQL examples and programming guide

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362621#comment-15362621 ] Cheng Lian commented on SPARK-16303: Please feel free to split the task. I'm going to narrow scope

[jira] [Updated] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16303: --- Summary: Update SQL examples and programming guide for Scala and Java language bindings

[jira] [Commented] (SPARK-16303) Update SQL examples and programming guide

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362579#comment-15362579 ] Cheng Lian commented on SPARK-16303: Here's the aforementioned WIP branch, which only contains

<    1   2   3   4   5   6   7   8   9   10   >