[jira] [Commented] (SPARK-37980) Extend METADATA column to support row indices for file based data sources

2022-02-01 Thread Cheng Lian (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485215#comment-17485215 ] Cheng Lian commented on SPARK-37980: [~prakharjain09], as you've mentioned, it's not super

[jira] [Updated] (SPARK-31935) Hadoop file system config should be effective in data source options

2020-06-30 Thread Cheng Lian (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-31935: --- Affects Version/s: (was: 3.0.1) (was: 3.1.0

[jira] [Updated] (SPARK-26352) Join reordering should not change the order of output attributes

2020-05-29 Thread Cheng Lian (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-26352: --- Summary: Join reordering should not change the order of output attributes (was: join reordering

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Cheng Lian
Oh, actually, in order to decouple Hadoop 3.2 and Hive 2.3 upgrades, we will need a hive-2.3 profile anyway, no matter having the hive-1.2 profile or not. On Wed, Nov 20, 2019 at 3:33 PM Cheng Lian wrote: > Just to summarize my points: > >1. Let's still keep the Hive 1.2 dependency

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-20 Thread Cheng Lian
Sean, thanks for the corner cases you listed. They make a lot of sense. Now I do incline to have Hive 2.3 as the default version. Dongjoon, apologize if I didn't make it clear before. What made me concerned initially was only the following part: > can we remove the usage of forked `hive` in

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-20 Thread Cheng Lian
adoop-3.2 > profile. > > What do you mean by "only meaningful under the hadoop-3.2 profile"? > > On Tue, Nov 19, 2019 at 5:40 PM Cheng Lian wrote: > >> Hey Steve, >> >> In terms of Maven artifact, I don't think the default Hadoop version >>

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Cheng Lian
/Hive versions in Spark 3.0, I personally do not have a preference as long as the above two are met. On Wed, Nov 20, 2019 at 3:22 PM Cheng Lian wrote: > Dongjoon, I don't think we have any conflicts here. As stated in other > threads multiple times, as long as Hive 2.3 and Hadoop 3.2 v

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Cheng Lian
n't want to interact with this > Hive 1.2 fork, they can always use Hive 2.3 at their own risks. > > Specifically, what about having a profile `hive-1.2` at `3.0.0` with the > default Hive 2.3 pom at least? > How do you think about that way, Cheng? > > Bests, > Dongjoon

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Cheng Lian
Hey Dongjoon and Felix, I totally agree that Hive 2.3 is more stable than Hive 1.2. Otherwise, we wouldn't even consider integrating with Hive 2.3 in Spark 3.0. However, *"Hive" and "Hive integration in Spark" are two quite different things*, and I don't think anybody has ever mentioned "the

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Cheng Lian
.3` pre-built distribution, how do > you think about this, Sean? > The preparation is already started in another email thread and I believe > that is a keystone to prove `Hive 2.3` version stability > (which Cheng/Hyukjin/you asked). > > Bests, > Dongjoon. > > > On Tue, Nov 19, 20

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-19 Thread Cheng Lian
; >> ------ >> *From:* Steve Loughran >> *Sent:* Sunday, November 17, 2019 9:22:09 AM >> *To:* Cheng Lian >> *Cc:* Sean Owen ; Wenchen Fan ; >> Dongjoon Hyun ; dev ; >> Yuming Wang >> *Subject:* Re: Use Hadoop-3.2 as a

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Cheng Lian
It's kinda like Scala version upgrade. Historically, we only remove the support of an older Scala version when the newer version is proven to be stable after one or more Spark minor versions. On Tue, Nov 19, 2019 at 2:07 PM Cheng Lian wrote: > Hmm, what exactly did you mean by "remove t

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Cheng Lian
been reluctant to (1) and (2) due to its burden. >> But, it's time to prepare. Without them, we are going to be insufficient >> again and again. >> >> Bests, >> Dongjoon. >> >> >> >> >> On Tue, Nov 19, 2019 at 9:26 AM Cheng Lian wrote: >&

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Cheng Lian
Dongjoon, I'm with Hyukjin. There should be at least one Spark 3.x minor release to stabilize Hive 2.3 code paths before retiring the Hive 1.2 fork. Even today, the Hive 2.3.6 version bundled in Spark 3.0 is still buggy in terms of JDK 11 support. (BTW, I just found that our root POM is referring

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-16 Thread Cheng Lian
Dongjoon, I didn't follow the original Hive 2.3 discussion closely. I thought the original proposal was to replace Hive 1.2 with Hive 2.3, which seemed risky, and therefore we only introduced Hive 2.3 under the hadoop-3.2 profile without removing Hive 1.2. But maybe I'm totally wrong here...

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-15 Thread Cheng Lian
Cc Yuming, Steve, and Dongjoon On Fri, Nov 15, 2019 at 10:37 AM Cheng Lian wrote: > Similar to Xiao, my major concern about making Hadoop 3.2 the default > Hadoop version is quality control. The current hadoop-3.2 profile covers > too many major component upgrades, i.e.: > >

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-15 Thread Cheng Lian
Similar to Xiao, my major concern about making Hadoop 3.2 the default Hadoop version is quality control. The current hadoop-3.2 profile covers too many major component upgrades, i.e.: - Hadoop 3.2 - Hive 2.3 - JDK 11 We have already found and fixed some feature and performance

[jira] [Updated] (SPARK-29667) implicitly convert mismatched datatypes on right side of "IN" operator

2019-10-30 Thread Cheng Lian (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-29667: --- Environment: (was: spark-2.4.3-bin-dbr-5.5-snapshot-9833d0f) > implicitly convert mismatc

[jira] [Commented] (SPARK-29667) implicitly convert mismatched datatypes on right side of "IN" operator

2019-10-30 Thread Cheng Lian (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963305#comment-16963305 ] Cheng Lian commented on SPARK-29667: Reproduced this with the following snippet: {code} spark.range

[jira] [Updated] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly

2019-10-10 Thread Cheng Lian (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-26806: --- Reporter: Cheng Lian (was: liancheng) > EventTimeStats.merge doesn't handle "zero.me

[jira] [Updated] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly

2019-10-10 Thread Cheng Lian (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-26806: --- Description: Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This

[jira] [Assigned] (SPARK-27369) Standalone worker can load resource conf and discover resources

2019-06-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-27369: -- Assignee: wuyi > Standalone worker can load resource conf and discover resour

[jira] [Assigned] (SPARK-27611) Redundant javax.activation dependencies in the Maven build

2019-05-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-27611: -- Assignee: Cheng Lian > Redundant javax.activation dependencies in the Maven bu

[jira] [Created] (SPARK-27611) Redundant javax.activation dependencies in the Maven build

2019-04-30 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-27611: -- Summary: Redundant javax.activation dependencies in the Maven build Key: SPARK-27611 URL: https://issues.apache.org/jira/browse/SPARK-27611 Project: Spark Issue

[jira] [Commented] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678595#comment-16678595 ] Cheng Lian commented on SPARK-25966: [~andrioni], just realized that I might misunderstand this part

[jira] [Comment Edited] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678542#comment-16678542 ] Cheng Lian edited comment on SPARK-25966 at 11/7/18 5:34 PM: - Hey

[jira] [Comment Edited] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678542#comment-16678542 ] Cheng Lian edited comment on SPARK-25966 at 11/7/18 5:34 PM: - Hey

[jira] [Commented] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678542#comment-16678542 ] Cheng Lian commented on SPARK-25966: Hey, [~andrioni], if you still have the original (potentially

[jira] [Assigned] (SPARK-24927) The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files

2018-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-24927: -- Assignee: Cheng Lian > The hadoop-provided profile doesn't play well with Snappy-compres

[jira] [Updated] (SPARK-24927) The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files

2018-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-24927: --- Description: Reproduction: {noformat} wget https://archive.apache.org/dist/spark/spark-2.3.1/spark

[jira] [Commented] (SPARK-24927) The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files

2018-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16557603#comment-16557603 ] Cheng Lian commented on SPARK-24927: Downgraded from blocker to major, since it's not a regression

[jira] [Updated] (SPARK-24927) The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files

2018-07-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-24927: --- Priority: Major (was: Blocker) > The hadoop-provided profile doesn't play well with Sna

[jira] [Created] (SPARK-24927) The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files

2018-07-26 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-24927: -- Summary: The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files Key: SPARK-24927 URL: https://issues.apache.org/jira/browse/SPARK-24927

[jira] [Assigned] (SPARK-24895) Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames

2018-07-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-24895: -- Assignee: Eric Chang > Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatc

[jira] [Updated] (SPARK-24895) Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames

2018-07-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-24895: --- Description: Spark 2.4.0 has Maven build errors because artifacts uploaded to apache maven repo

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-23 Thread Cheng Lian
+1 (binding) Passed all the tests, looks good. Cheng On 2/23/18 15:00, Holden Karau wrote: +1 (binding) PySpark artifacts install in a fresh Py3 virtual env On Feb 23, 2018 7:55 AM, "Denny Lee" > wrote: +1 (non-binding) On

[jira] [Commented] (SPARK-19737) New analysis rule for reporting unregistered functions without relying on relation resolution

2018-02-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372289#comment-16372289 ] Cheng Lian commented on SPARK-19737: [~LANDAIS Christophe], I filed SPARK-23486 for this. Should

[jira] [Updated] (SPARK-23486) LookupFunctions should not check the same function name more than once

2018-02-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-23486: --- Labels: starter (was: ) > LookupFunctions should not check the same function name more than o

[jira] [Commented] (SPARK-23486) LookupFunctions should not check the same function name more than once

2018-02-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372285#comment-16372285 ] Cheng Lian commented on SPARK-23486: Please refer to [this comment|https://issues.apache.org/jira

[jira] [Created] (SPARK-23486) LookupFunctions should not check the same function name more than once

2018-02-21 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-23486: -- Summary: LookupFunctions should not check the same function name more than once Key: SPARK-23486 URL: https://issues.apache.org/jira/browse/SPARK-23486 Project: Spark

[jira] [Resolved] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value

2018-01-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-22951. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20174 [https

[jira] [Assigned] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value

2018-01-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-22951: -- Assignee: Feng Liu > count() after dropDuplicates() on emptyDataFrame returns incorrect va

[jira] [Updated] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value

2018-01-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-22951: --- Target Version/s: 2.3.0 > count() after dropDuplicates() on emptyDataFrame returns incorrect va

[jira] [Updated] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value

2018-01-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-22951: --- Labels: correctness (was: ) > count() after dropDuplicates() on emptyDataFrame returns incorr

[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275422#comment-16275422 ] Cheng Lian commented on HADOOP-15086: - To be more specific, when multiple threads rename files

Re: [VOTE][SPIP] SPARK-22026 data source v2 write path

2017-10-16 Thread Cheng Lian
+1 On 10/12/17 20:10, Liwei Lin wrote: +1 ! Cheers, Liwei On Thu, Oct 12, 2017 at 7:11 PM, vaquar khan > wrote: +1 Regards, Vaquar khan On Oct 11, 2017 10:14 PM, "Weichen Xu"

[jira] [Assigned] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs

2017-09-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned PARQUET-1102: --- Assignee: Cheng Lian > Travis CI builds are failing for parquet-format

[jira] [Resolved] (PARQUET-1091) Wrong and broken links in README

2017-09-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved PARQUET-1091. - Resolution: Fixed Fix Version/s: format-2.3.2 Issue resolved by pull request 65 [https

[jira] [Resolved] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs

2017-09-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved PARQUET-1102. - Resolution: Fixed Fix Version/s: format-2.3.2 Issue resolved by pull request 66 [https

[jira] [Updated] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs

2017-09-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-1102: Priority: Blocker (was: Major) > Travis CI builds are failing for parquet-format

[jira] [Created] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs

2017-09-12 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-1102: --- Summary: Travis CI builds are failing for parquet-format PRs Key: PARQUET-1102 URL: https://issues.apache.org/jira/browse/PARQUET-1102 Project: Parquet Issue

[jira] [Created] (PARQUET-1091) Wrong and broken links in README

2017-09-07 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-1091: --- Summary: Wrong and broken links in README Key: PARQUET-1091 URL: https://issues.apache.org/jira/browse/PARQUET-1091 Project: Parquet Issue Type: Bug

[jira] [Updated] (HADOOP-14700) NativeAzureFileSystem.open() ignores blob container name

2017-08-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated HADOOP-14700: Description: {{NativeAzureFileSystem}} instances are associated with the blob container used

[jira] [Updated] (HADOOP-14700) NativeAzureFileSystem.open() ignores blob container name

2017-08-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated HADOOP-14700: Description: {{NativeAzureFileSystem}} instances are associated with the blob container used

[jira] [Commented] (HADOOP-14700) NativeAzureFileSystem.open() ignores blob container name

2017-08-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111645#comment-16111645 ] Cheng Lian commented on HADOOP-14700: - Oops... Thanks for pointing out the typo, [~ste...@apache.org

[jira] [Updated] (HADOOP-14700) NativeAzureFileSystem.open() ignores blob container name

2017-08-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated HADOOP-14700: Description: {{NativeAzureFileSystem}} instances are associated with the blob container used

[jira] [Updated] (HADOOP-14700) NativeAzureFileSystem.open() ignores blob container name

2017-07-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated HADOOP-14700: Description: {{NativeAzureFileSystem}} instances are associated with the blob container used

[jira] [Created] (HADOOP-14700) NativeAzureFileSystem.open() ignores blob container name

2017-07-28 Thread Cheng Lian (JIRA)
Cheng Lian created HADOOP-14700: --- Summary: NativeAzureFileSystem.open() ignores blob container name Key: HADOOP-14700 URL: https://issues.apache.org/jira/browse/HADOOP-14700 Project: Hadoop Common

[jira] [Created] (HADOOP-14700) NativeAzureFileSystem.open() ignores blob container name

2017-07-28 Thread Cheng Lian (JIRA)
Cheng Lian created HADOOP-14700: --- Summary: NativeAzureFileSystem.open() ignores blob container name Key: HADOOP-14700 URL: https://issues.apache.org/jira/browse/HADOOP-14700 Project: Hadoop Common

[jira] [Assigned] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata

2017-07-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-9686: - Assignee: (was: Cheng Lian) > Spark Thrift server doesn't return correct JDBC metad

[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043478#comment-16043478 ] Cheng Lian commented on SPARK-20958: [~marmbrus], here is the draft release note entry: {quote} SPARK

[jira] [Updated] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-20958: --- Labels: release-notes release_notes releasenotes (was: release-notes) > Roll back parquet-mr 1.

[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035149#comment-16035149 ] Cheng Lian commented on SPARK-20958: Thanks [~rdblue]! I'm also reluctant to roll it back considering

[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034310#comment-16034310 ] Cheng Lian commented on SPARK-20958: [~rdblue] I think the root cause here is we cherry-picked

[jira] [Updated] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-20958: --- Description: We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on avro

[jira] [Created] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-01 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-20958: -- Summary: Roll back parquet-mr 1.8.2 to parquet-1.8.1 Key: SPARK-20958 URL: https://issues.apache.org/jira/browse/SPARK-20958 Project: Spark Issue Type: Bug

[jira] [Comment Edited] (PARQUET-980) Cannot read row group larger than 2GB

2017-05-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007326#comment-16007326 ] Cheng Lian edited comment on PARQUET-980 at 5/11/17 10:46 PM: -- The current

[jira] [Commented] (PARQUET-980) Cannot read row group larger than 2GB

2017-05-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007326#comment-16007326 ] Cheng Lian commented on PARQUET-980: The current write path ensures that it never writes a page

[jira] [Updated] (PARQUET-980) Cannot read row group larger than 2GB

2017-05-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-980: --- Affects Version/s: 1.8.1 1.8.2 > Cannot read row group larger than

[jira] [Updated] (SPARK-20132) Add documentation for column string functions

2017-05-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-20132: --- Fix Version/s: 2.2.0 > Add documentation for column string functi

[jira] [Updated] (SPARK-20246) Should check determinism when pushing predicates down through aggregation

2017-04-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-20246: --- Labels: correctness (was: ) > Should check determinism when pushing predicates down thro

[jira] [Commented] (SPARK-20246) Should check determinism when pushing predicates down through aggregation

2017-04-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959946#comment-15959946 ] Cheng Lian commented on SPARK-20246: [This line|https://github.com/apache/spark/blob

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-04-04 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19716: --- Fix Version/s: (was: 2.3.0) 2.2.0 > Dataset should allow by-name resolut

[jira] [Resolved] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-04-04 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-19716. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 17398 [https

[jira] [Assigned] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-04-04 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-19716: -- Assignee: Wenchen Fan > Dataset should allow by-name resolution for struct type eleme

[jira] [Updated] (SPARK-19912) String literals are not escaped while performing Hive metastore level partition pruning

2017-03-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19912: --- Summary: String literals are not escaped while performing Hive metastore level partition pruning

[jira] [Updated] (SPARK-19912) String literals are not escaped while performing partition pruning at Hive metastore level

2017-03-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19912: --- Description: {{Shim_v0_13.convertFilters()}} doesn't escape string literals while generating Hive

[jira] [Updated] (SPARK-19887) __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in partitioned persisted tables

2017-03-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19887: --- Labels: correctness (was: ) > __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition va

[jira] [Created] (SPARK-19912) String literals are not escaped while performing partition pruning at Hive metastore level

2017-03-10 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-19912: -- Summary: String literals are not escaped while performing partition pruning at Hive metastore level Key: SPARK-19912 URL: https://issues.apache.org/jira/browse/SPARK-19912

[jira] [Updated] (SPARK-19887) __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in partitioned persisted tables

2017-03-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19887: --- Affects Version/s: 2.2.0 > __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition va

[jira] [Created] (SPARK-19905) Dataset.inputFiles is broken for Hive SerDe tables

2017-03-10 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-19905: -- Summary: Dataset.inputFiles is broken for Hive SerDe tables Key: SPARK-19905 URL: https://issues.apache.org/jira/browse/SPARK-19905 Project: Spark Issue Type

[jira] [Updated] (SPARK-19887) __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in partitioned persisted tables

2017-03-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19887: --- Description: The following Spark shell snippet under Spark 2.1 reproduces this issue: {code} val

[jira] [Updated] (SPARK-19887) __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in partitioned persisted tables

2017-03-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19887: --- Description: The following Spark shell snippet under Spark 2.1 reproduces this issue: {code} val

[jira] [Updated] (SPARK-19887) __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in partitioned persisted tables

2017-03-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19887: --- Summary: __HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in partitioned

[jira] [Created] (SPARK-19887) __HIVE_DEFAULT_PARTITION__ not interpreted as NULL partition value in partitioned persisted tables

2017-03-09 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-19887: -- Summary: __HIVE_DEFAULT_PARTITION__ not interpreted as NULL partition value in partitioned persisted tables Key: SPARK-19887 URL: https://issues.apache.org/jira/browse/SPARK-19887

[jira] [Commented] (SPARK-19887) __HIVE_DEFAULT_PARTITION__ not interpreted as NULL partition value in partitioned persisted tables

2017-03-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903749#comment-15903749 ] Cheng Lian commented on SPARK-19887: cc [~cloud_fan] > __HIVE_DEFAULT_PARTITION__ not interpre

[jira] [Resolved] (SPARK-19737) New analysis rule for reporting unregistered functions without relying on relation resolution

2017-03-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-19737. Resolution: Fixed Issue resolved by pull request 17168 [https://github.com/apache/spark/pull/17168

[jira] [Assigned] (SPARK-19737) New analysis rule for reporting unregistered functions without relying on relation resolution

2017-03-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-19737: -- Assignee: Cheng Lian > New analysis rule for reporting unregistered functions without rely

[jira] [Updated] (SPARK-19737) New analysis rule for reporting unregistered functions without relying on relation resolution

2017-03-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19737: --- Description: Let's consider the following simple SQL query that reference an undefined function

[jira] [Updated] (SPARK-19737) New analysis rule for reporting unregistered functions without relying on relation resolution

2017-03-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19737: --- Description: Let's consider the following simple SQL query that reference an undefined function

[jira] [Updated] (SPARK-19737) New analysis rule for reporting unregistered functions without relying on relation resolution

2017-02-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19737: --- Description: Let's consider the following simple SQL query that reference an invalid function {{foo

[jira] [Updated] (SPARK-19737) New analysis rule for reporting unregistered functions without relying on relation resolution

2017-02-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19737: --- Description: Let's consider the following simple SQL query that reference an invalid function {{foo

[jira] [Created] (SPARK-19737) New analysis rule for reporting unregistered functions without relying on relation resolution

2017-02-24 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-19737: -- Summary: New analysis rule for reporting unregistered functions without relying on relation resolution Key: SPARK-19737 URL: https://issues.apache.org/jira/browse/SPARK-19737

Re: The driver hangs at DataFrame.rdd in Spark 2.1.0

2017-02-23 Thread Cheng Lian
? -- Original -- *From: * "Cheng Lian-3 [via Apache Spark Developers List]";<[hidden email] >; *Send time:* Thursday, Feb 23, 2017 9:43 AM *To:* "Stan Zhai"<[hidden email] >; *Subject: * Re: The driver hangs at DataFrame.rdd in Spark 2.1.0 Just from the th

[jira] [Updated] (PARQUET-893) GroupColumnIO.getFirst() doesn't check for empty groups

2017-02-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-893: --- Description: The following Spark snippet reproduces this issue with Spark 2.1 (with parquet-mr

[jira] [Created] (PARQUET-893) GroupColumnIO.getFirst() doesn't check for empty groups

2017-02-22 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-893: -- Summary: GroupColumnIO.getFirst() doesn't check for empty groups Key: PARQUET-893 URL: https://issues.apache.org/jira/browse/PARQUET-893 Project: Parquet Issue

[jira] [Updated] (SPARK-19529) TransportClientFactory.createClient() shouldn't call awaitUninterruptibly()

2017-02-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19529: --- Target Version/s: 1.6.3, 2.0.3, 2.1.1, 2.2.0 (was: 2.0.3, 2.1.1, 2.2.0

[jira] [Updated] (SPARK-19529) TransportClientFactory.createClient() shouldn't call awaitUninterruptibly()

2017-02-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-19529: --- Target Version/s: 2.0.3, 2.1.1, 2.2.0 (was: 2.0.3, 2.1.1) > TransportClientFactory.createCli

[jira] [Updated] (SPARK-18717) Datasets - crash (compile exception) when mapping to immutable scala map

2017-02-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18717: --- Fix Version/s: 2.1.1 > Datasets - crash (compile exception) when mapping to immutable scala

[jira] [Updated] (SPARK-18717) Datasets - crash (compile exception) when mapping to immutable scala map

2017-02-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-18717: --- Affects Version/s: 2.1.0 > Datasets - crash (compile exception) when mapping to immutable scala

  1   2   3   4   5   6   7   8   9   10   >