[jira] [Commented] (SPARK-17714) ClassCircularityError is thrown when using org.apache.spark.util.Utils.classForName 

2017-02-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856638#comment-15856638 ] Cheng Lian commented on SPARK-17714: Although I've no idea why this error occurs, it seems

[jira] [Resolved] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2017-02-03 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-18539. Resolution: Fixed Assignee: Dongjoon Hyun Target Version/s: 2.2.0 > Can

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2017-02-03 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851965#comment-15851965 ] Cheng Lian commented on SPARK-18539: SPARK-19409 upgrades parquet-mr to 1.8.2 and fixed this issue

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2017-01-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840186#comment-15840186 ] Cheng Lian commented on SPARK-18539: [~viirya], sorry for the (super) late reply. What I mentioned

Re: [VOTE] Release Apache Parquet 1.8.2 RC1

2017-01-23 Thread Cheng Lian
PS: The testing Spark branch can be found here https://github.com/apache/spark/compare/4a11d029dc6abeb98fef5725d3d446a3eb5deddf...liancheng:try-parquet-1.8.2-rc1 On 1/23/17 1:36 PM, Cheng Lian wrote: +1 if the maven-remote-resource-plugin issue is irrelevant. Checks performed: * Checked

Re: [VOTE] Release Apache Parquet 1.8.2 RC1

2017-01-23 Thread Cheng Lian
ddf> that upgrades parquet-mr to 1.8.2-rc1 and ran related tests locally. Cheng On 1/23/17 12:59 PM, Cheng Lian wrote: One thing I hit is that I have to add | org.apache.maven.plugins maven-remote-resources-plugin 1.5 true | in pom.xml to build parquet-mr 1.8.2-rc1. Otherwise I consistent

Re: [VOTE] Release Apache Parquet 1.8.2 RC1

2017-01-23 Thread Cheng Lian
9.0. Considering other people built the release just fine, did I miss something here? Cheng On 1/23/17 12:33 PM, Julien Le Dem wrote: Thank you Cheng! On Mon, Jan 23, 2017 at 12:02 PM, Cheng Lian <lian.cs@gmail.com <mailto:lian.cs@gmail.com>> wrote: Sorry for being lat

Re: [VOTE] Release Apache Parquet 1.8.2 RC1

2017-01-23 Thread Cheng Lian
Sorry for being late, I'm building a Spark branch based on the most recent master to test out 1.8.2-rc1, will post my result here ASAP. Cheng On 1/23/17 11:43 AM, Julien Le Dem wrote: Hi Spark dev, Here is the voting thread for parquet 1.8.2 release. Cheng or someone else we would appreciate

Re: Parquet patch release

2017-01-09 Thread Cheng Lian
Finished reviewing the list and it LGTM now (left comments in the spreadsheet and Ryan already made corresponding changes). Ryan - Thanks a lot for pushing this and making it happen! Cheng On 1/6/17 3:46 PM, Ryan Blue wrote: Last month, there was interest in a Parquet patch release on PR

Re: 1.8.2 patch release

2017-01-09 Thread Cheng Lian
Already reviewed the list and left comments on the spreadsheet. The current list LGTM. Thanks Ryan for doing this! Cheng On 1/9/17 1:36 PM, Ryan Blue wrote: I posted a note on the Spark list pointing to this thread. I also know most of the issues that they're interested in fixing, so I've

[jira] [Commented] (HIVE-11611) A bad performance regression issue with Parquet happens if Hive does not select any columns

2017-01-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813452#comment-15813452 ] Cheng Lian commented on HIVE-11611: --- While trying to fix a similar issue without upgrading Parquet

[jira] [Resolved] (SPARK-19016) Document scalable partition handling feature in the programming guide

2016-12-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-19016. Resolution: Fixed Fix Version/s: 2.2.0 2.1.1 Issue resolved by pull

[jira] [Created] (SPARK-19016) Document scalable partition handling feature in the programming guide

2016-12-28 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-19016: -- Summary: Document scalable partition handling feature in the programming guide Key: SPARK-19016 URL: https://issues.apache.org/jira/browse/SPARK-19016 Project: Spark

[jira] [Created] (SPARK-18956) Python API should reuse existing SparkSession while creating new SQLContext instances

2016-12-20 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-18956: -- Summary: Python API should reuse existing SparkSession while creating new SQLContext instances Key: SPARK-18956 URL: https://issues.apache.org/jira/browse/SPARK-18956

[jira] [Updated] (SPARK-18950) Report conflicting fields when merging two StructTypes.

2016-12-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18950: --- Labels: starter (was: ) > Report conflicting fields when merging two StructTy

[jira] [Created] (SPARK-18950) Report conflicting fields when merging two StructTypes.

2016-12-20 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-18950: -- Summary: Report conflicting fields when merging two StructTypes. Key: SPARK-18950 URL: https://issues.apache.org/jira/browse/SPARK-18950 Project: Spark Issue

Re: How to reflect dynamic registration udf?

2016-12-16 Thread Cheng Lian
Could you please provide more context about what you are trying to do here? On Thu, Dec 15, 2016 at 6:27 PM 李斌松 wrote: > How to reflect dynamic registration udf? > > java.lang.UnsupportedOperationException: Schema for type _$13 is not > supported > at >

[jira] [Updated] (SPARK-18753) Inconsistent behavior after writing to parquet files

2016-12-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18753: --- Fix Version/s: 2.2.0 > Inconsistent behavior after writing to parquet fi

[jira] [Updated] (SPARK-18753) Inconsistent behavior after writing to parquet files

2016-12-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18753: --- Assignee: Hyukjin Kwon > Inconsistent behavior after writing to parquet fi

[jira] [Resolved] (SPARK-18753) Inconsistent behavior after writing to parquet files

2016-12-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-18753. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 16184 [https

[jira] [Comment Edited] (SPARK-18712) keep the order of sql expression and support short circuit

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724381#comment-15724381 ] Cheng Lian edited comment on SPARK-18712 at 12/6/16 5:10 AM: - I think

[jira] [Commented] (SPARK-18712) keep the order of sql expression and support short circuit

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724381#comment-15724381 ] Cheng Lian commented on SPARK-18712: I think the contract here is that for a DataFrame {{df}} and 1

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724013#comment-15724013 ] Cheng Lian commented on SPARK-18539: [~xwu0226], thanks for the new use case! [~viirya], I do think

[jira] [Updated] (SPARK-18730) Ask the build script to link to Jenkins test report page instead of full console output page when posting to GitHub

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18730: --- Priority: Minor (was: Major) > Ask the build script to link to Jenkins test report page inst

[jira] [Created] (SPARK-18730) Ask the build script to link to Jenkins test report page instead of full console output page when posting to GitHub

2016-12-05 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-18730: -- Summary: Ask the build script to link to Jenkins test report page instead of full console output page when posting to GitHub Key: SPARK-18730 URL: https://issues.apache.org/jira

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723781#comment-15723781 ] Cheng Lian commented on SPARK-18539: Please remind me if I missed anything important, otherwise, we

[jira] [Comment Edited] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723747#comment-15723747 ] Cheng Lian edited comment on SPARK-18539 at 12/5/16 11:43 PM: -- [~v-gerasimov

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723747#comment-15723747 ] Cheng Lian commented on SPARK-18539: [~v-gerasimov], [~smilegator], and [~xwu0226], after some

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723718#comment-15723718 ] Cheng Lian commented on SPARK-18539: As commented on GitHub, there're two issues right now

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15722891#comment-15722891 ] Cheng Lian commented on SPARK-18539: Haven't looked deeply into this issue, but my hunch

[jira] [Assigned] (SPARK-17213) Parquet String Pushdown for Non-Eq Comparisons Broken

2016-12-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-17213: -- Assignee: Cheng Lian > Parquet String Pushdown for Non-Eq Comparisons Bro

[jira] [Commented] (SPARK-17213) Parquet String Pushdown for Non-Eq Comparisons Broken

2016-12-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712707#comment-15712707 ] Cheng Lian commented on SPARK-17213: Agree that we should disable string and binary filter push down

[jira] [Resolved] (SPARK-9876) Upgrade parquet-mr to 1.8.1

2016-12-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-9876. --- Resolution: Fixed Fix Version/s: 2.1.0 > Upgrade parquet-mr to 1.

[jira] [Commented] (SPARK-18251) DataSet API | RuntimeException: Null value appeared in non-nullable field when holding Option Case Class

2016-11-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709869#comment-15709869 ] Cheng Lian commented on SPARK-18251: One more comment about why we shouldn't allow a {{Option\[T

[jira] [Updated] (SPARK-18251) DataSet API | RuntimeException: Null value appeared in non-nullable field when holding Option Case Class

2016-11-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18251: --- Assignee: Wenchen Fan > DataSet API | RuntimeException: Null value appeared in non-nullable fi

[jira] [Resolved] (SPARK-18251) DataSet API | RuntimeException: Null value appeared in non-nullable field when holding Option Case Class

2016-11-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-18251. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 15979 [https

[jira] [Comment Edited] (SPARK-18403) ObjectHashAggregateSuite is being flaky (occasional OOM errors)

2016-11-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684659#comment-15684659 ] Cheng Lian edited comment on SPARK-18403 at 11/22/16 6:54 AM: -- Here

[jira] [Commented] (SPARK-18403) ObjectHashAggregateSuite is being flaky (occasional OOM errors)

2016-11-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685389#comment-15685389 ] Cheng Lian commented on SPARK-18403: Figured it out. It's caused by a false sharing issue inside

[jira] [Commented] (SPARK-18403) ObjectHashAggregateSuite is being flaky (occasional OOM errors)

2016-11-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684659#comment-15684659 ] Cheng Lian commented on SPARK-18403: Here is a minimal test case (add

[jira] [Commented] (SPARK-11785) When deployed against remote Hive metastore with lower versions, JDBC metadata calls throws exception

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677469#comment-15677469 ] Cheng Lian commented on SPARK-11785: But I'm not sure which PR fixes this issue, though. > W

[jira] [Commented] (SPARK-11785) When deployed against remote Hive metastore with lower versions, JDBC metadata calls throws exception

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677468#comment-15677468 ] Cheng Lian commented on SPARK-11785: Confirmed that this is no longer an issue for 2.1 > W

[jira] [Comment Edited] (SPARK-18251) DataSet API | RuntimeException: Null value appeared in non-nullable field when holding Option Case Class

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677396#comment-15677396 ] Cheng Lian edited comment on SPARK-18251 at 11/18/16 6:38 PM: -- I'd prefer

[jira] [Comment Edited] (SPARK-18251) DataSet API | RuntimeException: Null value appeared in non-nullable field when holding Option Case Class

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677396#comment-15677396 ] Cheng Lian edited comment on SPARK-18251 at 11/18/16 6:37 PM: -- I'd prefer

[jira] [Commented] (SPARK-18251) DataSet API | RuntimeException: Null value appeared in non-nullable field when holding Option Case Class

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677396#comment-15677396 ] Cheng Lian commented on SPARK-18251: I'd prefer option 1 because of consistency of the semantics

[jira] [Created] (SPARK-18451) Always set -XX:+HeapDumpOnOutOfMemoryError for Spark tests

2016-11-15 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-18451: -- Summary: Always set -XX:+HeapDumpOnOutOfMemoryError for Spark tests Key: SPARK-18451 URL: https://issues.apache.org/jira/browse/SPARK-18451 Project: Spark Issue

Re: Is `randomized aggregation test` testsuite stable?

2016-11-10 Thread Cheng Lian
JIRA: https://issues.apache.org/jira/browse/SPARK-18403 PR: https://github.com/apache/spark/pull/15845 Will merge it as soon as Jenkins passes. Cheng On 11/10/16 11:30 AM, Dongjoon Hyun wrote: Great! Thank you so much, Cheng! Bests, Dongjoon. On 2016-11-10 11:21 (-0800), Cheng Lian

[jira] [Created] (SPARK-18403) ObjectHashAggregateSuite is being flaky (occasional OOM errors)

2016-11-10 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-18403: -- Summary: ObjectHashAggregateSuite is being flaky (occasional OOM errors) Key: SPARK-18403 URL: https://issues.apache.org/jira/browse/SPARK-18403 Project: Spark

Re: Is `randomized aggregation test` testsuite stable?

2016-11-10 Thread Cheng Lian
Hey Dongjoon, Thanks for reporting. I'm looking into these OOM errors. Already reproduced them locally but haven't figured out the root cause yet. Gonna disable them temporarily for now. Sorry for the inconvenience! Cheng On 11/10/16 8:48 AM, Dongjoon Hyun wrote: Hi, All. Recently, I

[jira] [Commented] (SPARK-18390) Optimized plan tried to use Cartesian join when it is not enabled

2016-11-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652202#comment-15652202 ] Cheng Lian commented on SPARK-18390: I think this issue has already been fixed by SPARK-17298

[jira] [Updated] (SPARK-18390) Optimized plan tried to use Cartesian join when it is not enabled

2016-11-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18390: --- Description: {code} val df2 = spark.range(1e9.toInt).withColumn("one", lit(1)) val df3 = s

[jira] [Updated] (SPARK-18338) ObjectHashAggregateSuite fails under Maven builds

2016-11-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18338: --- Description: Test case initialization order under Maven and SBT are different. Maven always creates

[jira] [Created] (SPARK-18338) ObjectHashAggregateSuite fails under Maven builds

2016-11-07 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-18338: -- Summary: ObjectHashAggregateSuite fails under Maven builds Key: SPARK-18338 URL: https://issues.apache.org/jira/browse/SPARK-18338 Project: Spark Issue Type

[jira] [Updated] (SPARK-17972) Query planning slows down dramatically for large query plans even when sub-trees are cached

2016-11-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17972: --- Description: The following Spark shell snippet creates a series of query plans that grow

[jira] [Resolved] (SPARK-11879) Checkpoint support for DataFrame/Dataset

2016-11-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11879. Resolution: Duplicate > Checkpoint support for DataFrame/Data

[jira] [Commented] (SPARK-11879) Checkpoint support for DataFrame/Dataset

2016-11-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630537#comment-15630537 ] Cheng Lian commented on SPARK-11879: Sorry that I didn't notice this ticket while working on SPARK

[jira] [Commented] (SPARK-18209) More robust view canonicalization without full SQL expansion

2016-11-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626823#comment-15626823 ] Cheng Lian commented on SPARK-18209: One problem of the proposed approach is that our SQL parser

[jira] [Created] (SPARK-18186) Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation support

2016-10-31 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-18186: -- Summary: Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation support Key: SPARK-18186 URL: https://issues.apache.org/jira/browse/SPARK-18186

[jira] [Commented] (SPARK-18053) ARRAY equality is broken in Spark 2.0

2016-10-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602974#comment-15602974 ] Cheng Lian commented on SPARK-18053: Yea, reproduced using 2.0. > ARRAY equality is broken in Sp

[jira] [Commented] (SPARK-18053) ARRAY equality is broken in Spark 2.0

2016-10-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602969#comment-15602969 ] Cheng Lian commented on SPARK-18053: Hm, the user mailing list thread said that it fails under 2.0

Re: Writing to Parquet Job turns to wait mode after even completion of job

2016-10-24 Thread Cheng Lian
On 10/22/16 6:18 AM, Steve Loughran wrote: ... On Sat, Oct 22, 2016 at 3:41 AM, Cheng Lian <lian.cs@gmail.com <mailto:lian.cs@gmail.com>> wrote: What version of Spark are you using and how many output files does the job writes out? By default, Spark version

Re: [Spark 2.0.0] error when unioning to an empty dataset

2016-10-24 Thread Cheng Lian
lines? Exactly. On Fri, Oct 21, 2016 at 3:39 PM Cheng Lian <lian.cs@gmail.com <mailto:lian.cs@gmail.com>> wrote: Efe - You probably hit this bug: https://issues.apache.org/jira/browse/SPARK-18058 On 10/21/16 2:03 AM, Agraj Mangal wrote: I have see

Re: RDD groupBy() then random sort each group ?

2016-10-21 Thread Cheng Lian
I think it would much easier to use DataFrame API to do this by doing local sort using randn() as key. For example, in Spark 2.0: val df = spark.range(100) val shuffled = df.repartition($"id" % 10).sortWithinPartitions(randn(42)) Replace df with a DataFrame wrapping your RDD, and $"id" % 10

Re: [Spark 2.0.0] error when unioning to an empty dataset

2016-10-21 Thread Cheng Lian
Efe - You probably hit this bug: https://issues.apache.org/jira/browse/SPARK-18058 On 10/21/16 2:03 AM, Agraj Mangal wrote: I have seen this error sometimes when the elements in the schema have different nullabilities. Could you print the schema for data and for

Re: How to iterate the element of an array in DataFrame?

2016-10-21 Thread Cheng Lian
You may either use SQL function "array" and "named_struct" or define a case class with expected field names. Cheng On 10/21/16 2:45 AM, 颜发才(Yan Facai) wrote: My expectation is: root |-- tag: vector namely, I want to extract from: [[tagCategory_060, 0.8], [tagCategory_029, 0.7]]| to:

Re: Dataframe schema...

2016-10-21 Thread Cheng Lian
Yea, confirmed. While analyzing unions, we treat StructTypes with different field nullabilities as incompatible types and throws this error. Opened https://issues.apache.org/jira/browse/SPARK-18058 to track this issue. Thanks for reporting! Cheng On 10/21/16 3:15 PM, Cheng Lian wrote: Hi

[jira] [Created] (SPARK-18058) AnalysisException may be thrown when union two DFs whose struct fields have different nullability

2016-10-21 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-18058: -- Summary: AnalysisException may be thrown when union two DFs whose struct fields have different nullability Key: SPARK-18058 URL: https://issues.apache.org/jira/browse/SPARK-18058

Re: Dataframe schema...

2016-10-21 Thread Cheng Lian
Hi Muthu, What is the version of Spark are you using? This seems to be a bug in the analysis phase. Cheng On 10/21/16 12:50 PM, Muthu Jayakumar wrote: Sorry for the late response. Here is what I am seeing... Schema from parquet file. d1.printSchema() root |-- task_id: string (nullable =

Re: Writing to Parquet Job turns to wait mode after even completion of job

2016-10-21 Thread Cheng Lian
What version of Spark are you using and how many output files does the job writes out? By default, Spark versions before 1.6 (not including) writes Parquet summary files when committing the job. This process reads footers from all Parquet files in the destination directory and merges them

[jira] [Updated] (SPARK-17949) Introduce a JVM object based aggregate operator

2016-10-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17949: --- Description: The new Tungsten execution engine has very robust memory management and speed

[jira] [Updated] (SPARK-17949) Introduce a JVM object based aggregate operator

2016-10-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17949: --- Description: The new Tungsten execution engine has very robust memory management and speed

Re: Where condition on columns of Arrays does no longer work in spark 2

2016-10-21 Thread Cheng Lian
Thanks for reporting! It's a bug, just filed a ticket to track it: https://issues.apache.org/jira/browse/SPARK-18053 Cheng On 10/20/16 1:54 AM, filthysocks wrote: I have a Column in a DataFrame that contains Arrays and I wanna filter for equality. It does work fine in spark 1.6 but not in

[jira] [Updated] (SPARK-18053) ARRAY equality is broken in Spark 2.0

2016-10-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18053: --- Labels: correctness (was: ) > ARRAY equality is broken in Spark

[jira] [Updated] (SPARK-18053) ARRAY equality is broken in Spark 2.0

2016-10-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18053: --- Description: The following Spark shell reproduces this issue: {code} case class Test(a: Seq[Int

[jira] [Created] (SPARK-18053) ARRAY equality is broken in Spark 2.0

2016-10-21 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-18053: -- Summary: ARRAY equality is broken in Spark 2.0 Key: SPARK-18053 URL: https://issues.apache.org/jira/browse/SPARK-18053 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-18012) Simplify WriterContainer code

2016-10-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-18012. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15551 [https

[jira] [Updated] (SPARK-17949) Introduce a JVM object based aggregate operator

2016-10-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17949: --- Attachment: [Design Doc] Support for Arbitrary Aggregation States.pdf > Introduce a JVM object ba

[jira] [Updated] (SPARK-17949) Introduce a JVM object based aggregate operator

2016-10-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17949: --- Attachment: (was: [Design Doc] Support for Arbitrary Aggregation States.pdf) > Introduce a

[jira] [Updated] (SPARK-17949) Introduce a JVM object based aggregate operator

2016-10-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17949: --- Attachment: [Design Doc] Support for Arbitrary Aggregation States.pdf > Introduce a JVM object ba

[jira] [Created] (PARQUET-754) Deprecate the "strict" argument in MessageType.union()

2016-10-17 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-754: -- Summary: Deprecate the "strict" argument in MessageType.union() Key: PARQUET-754 URL: https://issues.apache.org/jira/browse/PARQUET-754 Project: Parquet

[jira] [Commented] (PARQUET-753) GroupType.union() doesn't merge the original type

2016-10-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583942#comment-15583942 ] Cheng Lian commented on PARQUET-753: PARQUET-379 resolves the {{union}} issue related to primitive

Re: Consuming parquet files built with version 1.8.1

2016-10-17 Thread Cheng Lian
Hi Dinesh, Thanks for reporting. This is kinda weird and I can't reproduce this. Were doing the experiments using a clean compiled Spark master branch? And I don't think you have to use parquet-mr 1.8.1 to read Parquet files generated using parquet-mr 1.8.1 unless you are using something not

[jira] [Created] (SPARK-17972) Query planning slows down dramatically for large query plans even when sub-trees are cached

2016-10-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-17972: -- Summary: Query planning slows down dramatically for large query plans even when sub-trees are cached Key: SPARK-17972 URL: https://issues.apache.org/jira/browse/SPARK-17972

[jira] [Assigned] (SPARK-17949) Introduce a JVM object based aggregate operator

2016-10-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-17949: -- Assignee: Cheng Lian > Introduce a JVM object based aggregate opera

[jira] [Commented] (SPARK-10954) Parquet version in the "created_by" metadata field of Parquet files written by Spark 1.5 and 1.6 is wrong

2016-10-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576623#comment-15576623 ] Cheng Lian commented on SPARK-10954: [~hyukjin.kwon], yes, confirmed. Thanks! > Parquet vers

[jira] [Closed] (SPARK-9783) Use SqlNewHadoopRDD in JSONRelation to eliminate extra refresh() call

2016-10-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian closed SPARK-9783. - Resolution: Not A Problem This issue is no longer a problem since we re-implemented the JSON data source

[jira] [Commented] (SPARK-9783) Use SqlNewHadoopRDD in JSONRelation to eliminate extra refresh() call

2016-10-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576523#comment-15576523 ] Cheng Lian commented on SPARK-9783: --- Yes, I'm closing this. Thanks! > Use SqlNewHadoop

[jira] [Commented] (SPARK-17636) Parquet filter push down doesn't handle struct fields

2016-10-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576513#comment-15576513 ] Cheng Lian commented on SPARK-17636: [~MasterDDT], yes, just as what [~hyukjin.kwon] explained

[jira] [Updated] (SPARK-17636) Parquet filter push down doesn't handle struct fields

2016-10-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17636: --- Description: There's a *PushedFilters* for a simple numeric field, but not for a numeric field

[jira] [Comment Edited] (SPARK-17845) Improve window function frame boundary API in DataFrame

2016-10-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561376#comment-15561376 ] Cheng Lian edited comment on SPARK-17845 at 10/10/16 6:43 AM: -- One thing

[jira] [Comment Edited] (SPARK-17845) Improve window function frame boundary API in DataFrame

2016-10-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561376#comment-15561376 ] Cheng Lian edited comment on SPARK-17845 at 10/10/16 6:00 AM: -- One thing

[jira] [Commented] (SPARK-17845) Improve window function frame boundary API in DataFrame

2016-10-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561376#comment-15561376 ] Cheng Lian commented on SPARK-17845: One thing is that ANSI SQL also allows using arbitrary integral

Re: welcoming Xiao Li as a committer

2016-10-04 Thread Cheng Lian
Congratulations!!! Cheng On Tue, Oct 4, 2016 at 1:46 PM, Reynold Xin wrote: > Hi all, > > Xiao Li, aka gatorsmile, has recently been elected as an Apache Spark > committer. Xiao has been a super active contributor to Spark SQL. Congrats > and welcome, Xiao! > > - Reynold >

[jira] [Commented] (SPARK-17725) Spark should not write out parquet files with schema containing non-nullable fields

2016-09-29 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533109#comment-15533109 ] Cheng Lian commented on SPARK-17725: Reproducing this issue by writing a Parquet file using: {code

[jira] [Resolved] (SPARK-16516) Support for pushing down filters for decimal and timestamp types in ORC

2016-09-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16516. Resolution: Fixed Issue resolved by pull request 14172 [https://github.com/apache/spark/pull/14172

[jira] [Updated] (SPARK-16777) Parquet schema converter depends on deprecated APIs

2016-09-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16777: --- Fix Version/s: (was: 2.2.0) 2.1.0 > Parquet schema converter depe

[jira] [Updated] (SPARK-16516) Support for pushing down filters for decimal and timestamp types in ORC

2016-09-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16516: --- Fix Version/s: 2.1.0 > Support for pushing down filters for decimal and timestamp types in

[jira] [Updated] (SPARK-16516) Support for pushing down filters for decimal and timestamp types in ORC

2016-09-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16516: --- Assignee: Hyukjin Kwon > Support for pushing down filters for decimal and timestamp types in

[jira] [Updated] (SPARK-16777) Parquet schema converter depends on deprecated APIs

2016-09-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16777: --- Fix Version/s: 2.1.0 2.0.2 > Parquet schema converter depends on deprecated A

[jira] [Resolved] (SPARK-16777) Parquet schema converter depends on deprecated APIs

2016-09-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16777. Resolution: Fixed Fix Version/s: (was: 2.0.2) (was: 2.1.0

[jira] [Updated] (SPARK-16777) Parquet schema converter depends on deprecated APIs

2016-09-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16777: --- Assignee: Hyukjin Kwon > Parquet schema converter depends on deprecated A

<    1   2   3   4   5   6   7   8   9   10   >