[jira] [Commented] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126932#comment-15126932 ] Cheng Lian commented on SPARK-13101: The reason why 1.6.0 allows this illegal situation is that we

[jira] [Comment Edited] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126894#comment-15126894 ] Cheng Lian edited comment on SPARK-13101 at 2/1/16 7:45 PM: [~deenar] I also

[jira] [Comment Edited] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126890#comment-15126890 ] Cheng Lian edited comment on SPARK-13101 at 2/1/16 7:55 PM: I think

[jira] [Assigned] (SPARK-12718) SQL generation support for window functions

2016-02-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-12718: -- Assignee: Xiao Li > SQL generation support for window functi

[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger

2016-02-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127668#comment-15127668 ] Cheng Lian commented on PARQUET-401: Fix of this issue is nice to have but probably shouldn't block

[jira] [Commented] (SPARK-6319) Should throw analysis exception when using binary type in groupby/join

2016-02-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126668#comment-15126668 ] Cheng Lian commented on SPARK-6319: --- One possible but not necessarily the best workaround is to have

[jira] [Resolved] (PARQUET-495) Fix mismatches in Types class comments

2016-02-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved PARQUET-495. Resolution: Fixed Issue resolved by pull request 317 [https://github.com/apache/parquet-mr/pull

[jira] [Updated] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-13101: --- Description: There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By default

[jira] [Commented] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

2016-01-31 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125587#comment-15125587 ] Cheng Lian commented on SPARK-12725: There are other analysis rules that may use generated attributes

[jira] [Updated] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

2016-01-31 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12725: --- Description: Some analysis rules generate auxiliary attribute references with the same name

[jira] [Commented] (SPARK-12624) When schema is specified, we should give better error message if actual row length doesn't match

2016-01-29 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124556#comment-15124556 ] Cheng Lian commented on SPARK-12624: Yes, it should. > When schema is specified, we should g

[jira] [Resolved] (PARQUET-432) Complete a todo for method ColumnDescriptor.compareTo()

2016-01-29 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved PARQUET-432. Resolution: Fixed Issue resolved by pull request 314 [https://github.com/apache/parquet-mr/pull

[jira] [Resolved] (SPARK-13050) Scalatest tags fail builds with the addition of the sketch module

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-13050. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10954 [https

[jira] [Commented] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122013#comment-15122013 ] Cheng Lian commented on SPARK-12725: One possible solution I was thinking about is that we can add

[jira] [Commented] (SPARK-12723) Comprehensive SQL generation support for expressions

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122102#comment-15122102 ] Cheng Lian commented on SPARK-12723: Most of this issue is probably fixed in SPARK-12799 and PR

[jira] [Commented] (SPARK-12727) SQL generation support for distinct aggregation patterns that fit DistinctAggregationRewriter analysis rule

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122124#comment-15122124 ] Cheng Lian commented on SPARK-12727: It would be nice if we can recover the original distinct

[jira] [Updated] (SPARK-11012) Canonicalize view definitions

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11012: --- Description: In SPARK-10337, we added the first step of supporting view natively, which

[jira] [Updated] (SPARK-12719) SQL generation support for generators (including UDTF)

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12719: --- Description: {{HiveCompatibilitySuite}} can be useful for bootstrapping test coverage. Please refer

[jira] [Updated] (SPARK-12721) SQL generation support for script transformation

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12721: --- Description: {{HiveCompatibilitySuite}} can be useful for bootstrapping test coverage. Please refer

[jira] [Updated] (SPARK-12720) SQL generation support for cube, rollup, and grouping set

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12720: --- Description: {{HiveCompatibilitySuite}} can be useful for bootstrapping test coverage. Please refer

[jira] [Updated] (SPARK-12718) SQL generation support for window functions

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12718: --- Description: {{HiveWindowFunctionQuerySuite}} and {{HiveWindowFunctionQueryFileSuite}} can be useful

[jira] [Resolved] (SPARK-12401) Add support for enums in postgres

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-12401. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10596 [https

[jira] [Updated] (SPARK-12401) Add support for enums in postgres

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12401: --- Assignee: Takeshi Yamamuro > Add support for enums in postg

[jira] [Updated] (SPARK-13050) Scalatest tags fail builds with the addition of the sketch module

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-13050: --- Description: Builds fail at the new sketch module when a scalatest tag is used. Found when using

[jira] [Updated] (SPARK-11955) Mark one side fields in merging schema for safely pushdowning filters in parquet

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11955: --- Assignee: Liang-Chi Hsieh > Mark one side fields in merging schema for safely pushdowning filt

[jira] [Resolved] (SPARK-11955) Mark one side fields in merging schema for safely pushdowning filters in parquet

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11955. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 9940 [https

[jira] [Commented] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122043#comment-15122043 ] Cheng Lian commented on SPARK-12725: Thanks, this also sounds good to me. Will try this approach

[jira] [Updated] (SPARK-11012) Canonicalize view definitions

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11012: --- Description: In SPARK-10337, we added the first step of supporting view natively, which

[jira] [Updated] (SPARK-13050) Scalatest tags fail builds with the addition of the sketch module

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-13050: --- Assignee: Alex Bozarth > Scalatest tags fail builds with the addition of the sketch mod

[jira] [Created] (SPARK-13070) Points which physical file is the trouble maker when Parquet schema merging fails

2016-01-28 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-13070: -- Summary: Points which physical file is the trouble maker when Parquet schema merging fails Key: SPARK-13070 URL: https://issues.apache.org/jira/browse/SPARK-13070

[jira] [Updated] (SPARK-13070) Points out which physical file is the trouble maker when Parquet schema merging fails

2016-01-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-13070: --- Summary: Points out which physical file is the trouble maker when Parquet schema merging fails

Re: Parquet for very wide table

2016-01-25 Thread Cheng Lian
Aside from Nong's comment, I think PARQUET-222, where we discussed a performance issue of writing wide tables, can be helpful. Cheng On 1/23/16 4:53 PM, Nong Li wrote: I expect this to be difficult. This is roughly 3 orders of magnitude more than even a typical wide table use case. Answers

Re: cast column string -> timestamp in Parquet file

2016-01-25 Thread Cheng Lian
The following snippet may help: sqlContext.read.parquet(path).withColumn("col_ts", $"col".cast(TimestampType)).drop("col") Cheng On 1/21/16 6:58 AM, Muthu Jayakumar wrote: DataFrame and udf. This may be more performant than doing an RDD transformation as you'll only transform just the

Re: Parquet for very wide table

2016-01-25 Thread Cheng Lian
: Thanks Cheng, Nong. Data in the matrix is homogenous (cells are booleans), so, I don't expect to face memory related issues. Is the limitation on the # of columns or memory issues caused by the # of columns? To me it sounds more like memory issues. On Mon, Jan 25, 2016 at 10:16 AM, Cheng Lian

[jira] [Updated] (SPARK-12624) When schema is specified, we should give better error message if actual row length doesn't match

2016-01-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12624: --- Summary: When schema is specified, we should give better error message if actual row length doesn't

[jira] [Commented] (SPARK-12624) When schema is specified, we should give better error message if actual row length doesn't match

2016-01-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114123#comment-15114123 ] Cheng Lian commented on SPARK-12624: Quoted Davies' offline comment {quote} We always raise

[jira] [Updated] (SPARK-12624) When schema is specified, we should give better error message if actual row length doesn't match

2016-01-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12624: --- Description: The following code snippet reproduces this issue: {code} from pyspark.sql.types import

[jira] [Issue Comment Deleted] (SPARK-12818) Implement Bloom filter and count-min sketch in DataFrames

2016-01-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12818: --- Comment: was deleted (was: User 'liancheng' has created a pull request for this issue: https

[jira] [Updated] (SPARK-12938) Bloom filter DataFrame API integration

2016-01-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12938: --- Assignee: Wenchen Fan (was: Cheng Lian) > Bloom filter DataFrame API integrat

[jira] [Updated] (SPARK-12936) Initial bloom filter implementation

2016-01-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12936: --- Assignee: Wenchen Fan (was: Cheng Lian) > Initial bloom filter implementat

[jira] [Created] (SPARK-12935) Count-min sketch DataFrame API integration

2016-01-20 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12935: -- Summary: Count-min sketch DataFrame API integration Key: SPARK-12935 URL: https://issues.apache.org/jira/browse/SPARK-12935 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-12933) Initial count-min sketch implementation

2016-01-20 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12933: -- Summary: Initial count-min sketch implementation Key: SPARK-12933 URL: https://issues.apache.org/jira/browse/SPARK-12933 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-12934) Count-min sketch serialization

2016-01-20 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12934: -- Summary: Count-min sketch serialization Key: SPARK-12934 URL: https://issues.apache.org/jira/browse/SPARK-12934 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-12938) Bloom filter DataFrame API integration

2016-01-20 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12938: -- Summary: Bloom filter DataFrame API integration Key: SPARK-12938 URL: https://issues.apache.org/jira/browse/SPARK-12938 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-12937) Bloom filter serialization

2016-01-20 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12937: -- Summary: Bloom filter serialization Key: SPARK-12937 URL: https://issues.apache.org/jira/browse/SPARK-12937 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-12936) Initial bloom filter implementation

2016-01-20 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12936: -- Summary: Initial bloom filter implementation Key: SPARK-12936 URL: https://issues.apache.org/jira/browse/SPARK-12936 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-12937) Bloom filter serialization

2016-01-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12937: --- Assignee: Wenchen Fan (was: Cheng Lian) > Bloom filter serializat

[jira] [Resolved] (SPARK-12560) SqlTestUtils.stripSparkFilter needs to copy utf8strings

2016-01-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-12560. Resolution: Fixed Assignee: Imran Rashid Resolved by https://github.com/apache/spark/pull

[jira] [Updated] (SPARK-12560) SqlTestUtils.stripSparkFilter needs to copy utf8strings

2016-01-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12560: --- Fix Version/s: 2.0.0 > SqlTestUtils.stripSparkFilter needs to copy utf8stri

[jira] [Resolved] (SPARK-12867) Nullability of Intersect can be stricter

2016-01-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-12867. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10812 [https

[jira] [Commented] (SPARK-12867) Nullability of Intersect can be stricter

2016-01-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105616#comment-15105616 ] Cheng Lian commented on SPARK-12867: Thanks for helping! Go ahead, please. I'm assigning this to you

[jira] [Updated] (SPARK-12867) Nullability of Intersect can be stricter

2016-01-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12867: --- Assignee: Xiao Li > Nullability of Intersect can be stric

[jira] [Created] (SPARK-12867) Nullability of Intersect can be stricter

2016-01-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12867: -- Summary: Nullability of Intersect can be stricter Key: SPARK-12867 URL: https://issues.apache.org/jira/browse/SPARK-12867 Project: Spark Issue Type: Bug

Re: DataFrame partitionBy to a single Parquet file (per partition)

2016-01-15 Thread Cheng Lian
You may try DataFrame.repartition(partitionExprs: Column*) to shuffle all data belonging to a single (data) partition into a single (RDD) partition: |df.coalesce(1)|||.repartition("entity", "year", "month", "day", "status")|.write.partitionBy("entity", "year", "month", "day",

[jira] [Commented] (SPARK-12403) "Simba Spark ODBC Driver 1.0" not working with 1.5.2 anymore

2016-01-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094471#comment-15094471 ] Cheng Lian commented on SPARK-12403: Also, could you please provide the exact version number

[jira] [Assigned] (SPARK-12724) SQL generation support for persisted data source relations

2016-01-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-12724: -- Assignee: Cheng Lian > SQL generation support for persisted data source relati

[jira] [Resolved] (SPARK-12724) SQL generation support for persisted data source relations

2016-01-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-12724. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10712 [https

[jira] [Commented] (SPARK-12403) "Simba Spark ODBC Driver 1.0" not working with 1.5.2 anymore

2016-01-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094466#comment-15094466 ] Cheng Lian commented on SPARK-12403: Hi [~lunendl], I wonder what kind of Hive metastore were you

Re: parquet repartitions and parquet.enable.summary-metadata does not work

2016-01-12 Thread Cheng Lian
. Best, Gavin On Mon, Jan 11, 2016 at 4:31 PM, Cheng Lian <lian.cs@gmail.com <mailto:lian.cs@gmail.com>> wrote: Hey Gavin, Could you please provide a snippet of your code to show how did you disabled "parquet.enable.summary-metadata" and wrote th

Re: parquet repartitions and parquet.enable.summary-metadata does not work

2016-01-11 Thread Cheng Lian
Hey Gavin, Could you please provide a snippet of your code to show how did you disabled "parquet.enable.summary-metadata" and wrote the files? Especially, you mentioned you saw "3000 jobs" failed. Were you writing each Parquet file with an individual job? (Usually people use

[jira] [Updated] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists

2016-01-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12742: --- Assignee: Fei Wang > org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table alre

[jira] [Resolved] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists

2016-01-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-12742. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10682 [https

[jira] [Created] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12725: -- Summary: SQL generation suffers from name conficts introduced by some analysis rules Key: SPARK-12725 URL: https://issues.apache.org/jira/browse/SPARK-12725 Project

[jira] [Updated] (SPARK-12723) Comprehensive SQL generation support for expressions

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12723: --- Description: Ensure that all built-in expressions can be mapped to its SQL representation

[jira] [Updated] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12725: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > SQL generation suffers from name confi

[jira] [Updated] (SPARK-12728) Integrate SQL generation feature with native view

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12728: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > Integrate SQL generation feature with nat

[jira] [Updated] (SPARK-12727) SQL generation support for distinct aggregation patterns that fit DistinctAggregationRewriter analysis rule

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12727: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > SQL generation support for disti

[jira] [Created] (SPARK-12720) SQL generation support for cube, rollup, and grouping set

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12720: -- Summary: SQL generation support for cube, rollup, and grouping set Key: SPARK-12720 URL: https://issues.apache.org/jira/browse/SPARK-12720 Project: Spark Issue

[jira] [Created] (SPARK-12721) SQL generation support for script transformation

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12721: -- Summary: SQL generation support for script transformation Key: SPARK-12721 URL: https://issues.apache.org/jira/browse/SPARK-12721 Project: Spark Issue Type: Sub

[jira] [Commented] (SPARK-11012) Canonicalize view definitions

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090110#comment-15090110 ] Cheng Lian commented on SPARK-11012: Done. > Canonicalize view definiti

[jira] [Updated] (SPARK-11012) Canonicalize view definitions

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11012: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > Canonicalize view definiti

[jira] [Created] (SPARK-12728) Integrate SQL generation feature with native view

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12728: -- Summary: Integrate SQL generation feature with native view Key: SPARK-12728 URL: https://issues.apache.org/jira/browse/SPARK-12728 Project: Spark Issue Type

[jira] [Updated] (SPARK-12718) SQL generation support for window functions

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12718: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > SQL generation support for window functi

[jira] [Updated] (SPARK-12720) SQL generation support for cube, rollup, and grouping set

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12720: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > SQL generation support for cube, rol

[jira] [Updated] (SPARK-12719) SQL generation support for generators (including UDTF)

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12719: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > SQL generation support for generat

[jira] [Updated] (SPARK-12726) ParquetConversions doesn't always propagate metastore table identifier to ParquetRelation

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12726: --- Description: (I hit this issue while working on SPARK-12593, but haven't got time to investigate

[jira] [Updated] (SPARK-12723) Comprehensive SQL generation support for expressions

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12723: --- Description: Ensure that all built-in expressions can be mapped to its SQL representation

[jira] [Updated] (SPARK-12723) Comprehensive SQL generation support for expressions

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12723: --- Description: Ensure that all built-in expressions can be mapped to its SQL representation

[jira] [Created] (SPARK-12727) SQL generation support for distinct aggregation patterns that fit DistinctAggregationRewriter analysis rule

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12727: -- Summary: SQL generation support for distinct aggregation patterns that fit DistinctAggregationRewriter analysis rule Key: SPARK-12727 URL: https://issues.apache.org/jira/browse/SPARK

[jira] [Created] (SPARK-12718) SQL generation support for window functions

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12718: -- Summary: SQL generation support for window functions Key: SPARK-12718 URL: https://issues.apache.org/jira/browse/SPARK-12718 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-12593) Convert basic resolved logical plans back to SQL query strings

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12593: --- Summary: Convert basic resolved logical plans back to SQL query strings (was: Convert resolved

[jira] [Created] (SPARK-12719) SQL generation support for generators (including UDTF)

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12719: -- Summary: SQL generation support for generators (including UDTF) Key: SPARK-12719 URL: https://issues.apache.org/jira/browse/SPARK-12719 Project: Spark Issue

[jira] [Created] (SPARK-12724) SQL generation support for persisted data source relations

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12724: -- Summary: SQL generation support for persisted data source relations Key: SPARK-12724 URL: https://issues.apache.org/jira/browse/SPARK-12724 Project: Spark Issue

[jira] [Created] (SPARK-12723) Comprehensive SQL generation support for expressions

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12723: -- Summary: Comprehensive SQL generation support for expressions Key: SPARK-12723 URL: https://issues.apache.org/jira/browse/SPARK-12723 Project: Spark Issue Type

[jira] [Created] (SPARK-12726) ParquetConversions doesn't always propagate metastore table identifier to ParquetRelation

2016-01-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12726: -- Summary: ParquetConversions doesn't always propagate metastore table identifier to ParquetRelation Key: SPARK-12726 URL: https://issues.apache.org/jira/browse/SPARK-12726

[jira] [Updated] (SPARK-12721) SQL generation support for script transformation

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12721: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > SQL generation support for scr

[jira] [Updated] (SPARK-12723) Comprehensive SQL generation support for expressions

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12723: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > Comprehensive SQL generation supp

[jira] [Updated] (SPARK-12724) SQL generation support for persisted data source relations

2016-01-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12724: --- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > SQL generation support for persisted d

[jira] [Created] (SPARK-12593) Convert resolved logical plans back to SQL query strings

2015-12-31 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12593: -- Summary: Convert resolved logical plans back to SQL query strings Key: SPARK-12593 URL: https://issues.apache.org/jira/browse/SPARK-12593 Project: Spark Issue

[jira] [Created] (SPARK-12592) TestHive.reset hides Spark testing logs

2015-12-31 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12592: -- Summary: TestHive.reset hides Spark testing logs Key: SPARK-12592 URL: https://issues.apache.org/jira/browse/SPARK-12592 Project: Spark Issue Type: Test

[jira] [Commented] (SPARK-5948) Support writing to partitioned table for the Parquet data source

2015-12-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073596#comment-15073596 ] Cheng Lian commented on SPARK-5948: --- It's the HadoopFsRelation based Parquet data source

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-26 Thread Cheng Lian
+1 On 12/23/15 12:39 PM, Yin Huai wrote: +1 On Tue, Dec 22, 2015 at 8:10 PM, Denny Lee > wrote: +1 On Tue, Dec 22, 2015 at 7:05 PM Aaron Davidson > wrote: +1 On

[jira] [Created] (SPARK-12498) BooleanSimplification cleanup

2015-12-23 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12498: -- Summary: BooleanSimplification cleanup Key: SPARK-12498 URL: https://issues.apache.org/jira/browse/SPARK-12498 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-12478) Dataset fields of product types can't be null

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069044#comment-15069044 ] Cheng Lian commented on SPARK-12478: I'm leaving this ticket open since we also need to backport

[jira] [Updated] (SPARK-12478) Dataset fields of product types can't be null

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12478: --- Labels: backport-needed (was: ) > Dataset fields of product types can't be n

[jira] [Resolved] (SPARK-11164) Add InSet pushdown filter back for Parquet

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11164. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10278 [https

[jira] [Updated] (SPARK-11164) Add InSet pushdown filter back for Parquet

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11164: --- Assignee: Xiao Li > Add InSet pushdown filter back for Parq

[jira] [Updated] (SPARK-12478) Dataset fields of product types can't be null

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12478: --- Summary: Dataset fields of product types can't be null (was: Dataset fields whose types are case

[jira] [Updated] (SPARK-12478) Dataset fields whose types are case classs can't be null

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12478: --- Summary: Dataset fields whose types are case classs can't be null (was: Top level case class field

[jira] [Created] (SPARK-12478) Top level case class field of a Dataset can't be null

2015-12-22 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12478: -- Summary: Top level case class field of a Dataset can't be null Key: SPARK-12478 URL: https://issues.apache.org/jira/browse/SPARK-12478 Project: Spark Issue Type

<    3   4   5   6   7   8   9   10   11   12   >