[jira] [Closed] (SPARK-12444) A lightweight Scala DSL for schema construction

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian closed SPARK-12444. -- Resolution: Won't Fix > A lightweight Scala DSL for schema construct

[jira] [Resolved] (SPARK-12371) Make sure non-nullable arguments of NewInstance don't receive null input data

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-12371. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10331 [https

[jira] [Commented] (SPARK-12478) Dataset fields of product types can't be null

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068043#comment-15068043 ] Cheng Lian commented on SPARK-12478: [~marmbrus] I guess this issue probably doesn't block 1.6 since

[jira] [Resolved] (SPARK-9271) Concurrency bug triggered by partition predicate push-down

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-9271. --- Resolution: Cannot Reproduce > Concurrency bug triggered by partition predicate push-d

[jira] [Commented] (SPARK-9271) Concurrency bug triggered by partition predicate push-down

2015-12-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068434#comment-15068434 ] Cheng Lian commented on SPARK-9271: --- I only observed this issue in the context of PR #7492. Since

[jira] [Updated] (SPARK-12371) Make sure non-nullable arguments of NewInstance don't receive null input data

2015-12-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12371: --- Description: When extracting objects from underlying rows using a Dataset, it's possible

[jira] [Updated] (SPARK-12371) Make sure non-nullable arguments of NewInstance don't receive null input data

2015-12-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12371: --- Summary: Make sure non-nullable arguments of NewInstance don't receive null input data (was: Make

[jira] [Created] (SPARK-12444) A lightweight Scala DSL for schema construction

2015-12-20 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12444: -- Summary: A lightweight Scala DSL for schema construction Key: SPARK-12444 URL: https://issues.apache.org/jira/browse/SPARK-12444 Project: Spark Issue Type

[jira] [Updated] (SPARK-12369) DataFrameReader fails on globbing parquet paths that contain nonexistent path(s)

2015-12-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12369: --- Summary: DataFrameReader fails on globbing parquet paths that contain nonexistent path(s

[jira] [Updated] (SPARK-12369) DataFrameReader fails on globbing parquet paths that contain nonexistent path(s)

2015-12-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12369: --- Shepherd: Cheng Lian > DataFrameReader fails on globbing parquet paths that contain nonexist

[jira] [Commented] (SPARK-11941) JSON representation of nested StructTypes could be more uniform

2015-12-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065334#comment-15065334 ] Cheng Lian commented on SPARK-11941: One thing is that, the JSON representation of a DataFrame schema

[jira] [Commented] (SPARK-11148) Unable to create views

2015-12-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063721#comment-15063721 ] Cheng Lian commented on SPARK-11148: Did you mean the Windows ODBC driver provided by Simba? AFAIK

[jira] [Created] (SPARK-12406) Simple join query throws exception that complains generated classes are not found

2015-12-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12406: -- Summary: Simple join query throws exception that complains generated classes are not found Key: SPARK-12406 URL: https://issues.apache.org/jira/browse/SPARK-12406

[jira] [Updated] (SPARK-12406) Simple join query throws exception that complains generated classes are not found

2015-12-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12406: --- Description: This should be a regression of [PR #9923|https://github.com/apache/spark/pull/9923

[jira] [Commented] (SPARK-12406) Simple join query throws exception that complains generated classes are not found

2015-12-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061994#comment-15061994 ] Cheng Lian commented on SPARK-12406: cc [~vanzin] > Simple join query throws except

[jira] [Updated] (SPARK-12406) Simple join query throws exception that complains generated classes are not found

2015-12-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12406: --- Description: This should be a regression of [PR #9923|https://github.com/apache/spark/pull/9923

[jira] [Updated] (SPARK-12406) Simple join query throws exception that complains generated classes are not found

2015-12-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12406: --- Description: This should be a regression of [PR #9923|https://github.com/apache/spark/pull/9923

[jira] [Updated] (SPARK-12406) Codegen'd classes can't be found under REPL

2015-12-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12406: --- Summary: Codegen'd classes can't be found under REPL (was: Simple join query throws exception

[jira] [Updated] (SPARK-12406) Simple join query throws exception that complains generated classes are not found

2015-12-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12406: --- Priority: Blocker (was: Major) > Simple join query throws exception that complains genera

[jira] [Updated] (SPARK-12406) Codegen'd classes can't be found under REPL

2015-12-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12406: --- Description: This should be a regression of [PR #9923|https://github.com/apache/spark/pull/9923

[jira] [Created] (SPARK-12371) Make sure Dataset nullability conforms to its underlying logical plan

2015-12-16 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12371: -- Summary: Make sure Dataset nullability conforms to its underlying logical plan Key: SPARK-12371 URL: https://issues.apache.org/jira/browse/SPARK-12371 Project: Spark

[jira] [Created] (SPARK-12336) Outer join using multiple columns results in wrong nullability

2015-12-15 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12336: -- Summary: Outer join using multiple columns results in wrong nullability Key: SPARK-12336 URL: https://issues.apache.org/jira/browse/SPARK-12336 Project: Spark

[jira] [Created] (SPARK-12335) CentralMomentAgg should be nullable

2015-12-15 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12335: -- Summary: CentralMomentAgg should be nullable Key: SPARK-12335 URL: https://issues.apache.org/jira/browse/SPARK-12335 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-12335) CentralMomentAgg should be nullable

2015-12-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12335: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-12323 > CentralMomentAgg should be nulla

[jira] [Updated] (SPARK-12336) Outer join using multiple columns results in wrong nullability

2015-12-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12336: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-12323 > Outer join using multiple colu

[jira] [Created] (SPARK-12342) Corr (Pearson correlation) should be nullable

2015-12-15 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12342: -- Summary: Corr (Pearson correlation) should be nullable Key: SPARK-12342 URL: https://issues.apache.org/jira/browse/SPARK-12342 Project: Spark Issue Type: Sub

[jira] [Created] (SPARK-12341) The "comment" field of DESCRIBE result set should be nullable

2015-12-15 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12341: -- Summary: The "comment" field of DESCRIBE result set should be nullable Key: SPARK-12341 URL: https://issues.apache.org/jira/browse/SPARK-12341 Proj

Re: parquet file doubts

2015-12-14 Thread Cheng Lian
istics().isEmpty()) out.format(" STA:[%s]", meta.getStatistics().toString()); Cheng On 12/14/15 8:42 PM, Shushant Arora wrote: Hi Do you have any sample program in java to validate/read min max of column groups in Parquet file? Thanks On Tue, Dec 8, 2015 at 2:50 PM, Cheng Lian <

Re: memory leak when saving Parquet files in Spark

2015-12-14 Thread Cheng Lian
hanks, -Matt On Fri, Dec 11, 2015 at 1:58 AM, Cheng Lian <l...@databricks.com <mailto:l...@databricks.com>> wrote: This is probably caused by schema merging. Were you using Spark 1.4 or earlier versions? Could you please try the following snippet to see whether it he

[jira] [Created] (SPARK-12323) Don't assign default value for non-nullable columns of a Dataset

2015-12-14 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12323: -- Summary: Don't assign default value for non-nullable columns of a Dataset Key: SPARK-12323 URL: https://issues.apache.org/jira/browse/SPARK-12323 Project: Spark

[jira] [Updated] (SPARK-10364) Support Parquet logical type TIMESTAMP_MILLIS

2015-12-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10364: --- Description: The {{TimestampType}} in Spark SQL is of microsecond precision. Ideally, we should

Re: About the bottleneck of parquet file reading in Spark

2015-12-10 Thread Cheng Lian
Cc Spark user list since this information is generally useful. On Thu, Dec 10, 2015 at 3:31 PM, Lionheart <87249...@qq.com> wrote: > Dear, Cheng > I'm a user of Spark. Our current Spark version is 1.4.1 > In our project, I find there is a bottleneck when loading huge amount > of

[jira] [Resolved] (SPARK-10366) Support Parquet logical type DATE

2015-12-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10366. Resolution: Fixed Actually this has already been implemented since at least 1.4. > Supp

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

2015-12-10 Thread Cheng Lian
Hi Shane, I found that Jenkins has been in the status of "Jenkins is going to shut down" for at least 4 hours (from ~23:30 Dec 9 to 3:45 Dec 10, PDT). Not sure whether this is part of the schedule or related? Cheng On Thu, Dec 10, 2015 at 3:56 AM, shane knapp wrote: >

Re: memory leak when saving Parquet files in Spark

2015-12-10 Thread Cheng Lian
This is probably caused by schema merging. Were you using Spark 1.4 or earlier versions? Could you please try the following snippet to see whether it helps: df.write .format("parquet") .option("mergeSchema", "false") .partitionBy(partitionCols: _*) .mode(saveMode) .save(targetPath)

[jira] [Resolved] (SPARK-12131) Cannot create ExpressionEncoder for Array[T] where T is a nested class

2015-12-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-12131. Resolution: Fixed Fix Version/s: 2.0.0 Resolved by https://github.com/apache/spark/pull

[jira] [Resolved] (SPARK-12252) refactor MapObjects to make it less hacky

2015-12-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-12252. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10239 [https

Re: parquet file doubts

2015-12-08 Thread Cheng Lian
com <mailto:absi...@informatica.com>> wrote: Yes, Parquet has min/max. *From:*Cheng Lian [mailto:l...@databricks.com <mailto:l...@databricks.com>] *Sent:* Monday, December 07, 2015 11:21 AM *To:* Ted Yu *Cc:* Shushant Arora; u...@spark.apache.org <mailto:

Re: parquet file doubts

2015-12-08 Thread Cheng Lian
com <mailto:absi...@informatica.com>> wrote: Yes, Parquet has min/max. *From:*Cheng Lian [mailto:l...@databricks.com <mailto:l...@databricks.com>] *Sent:* Monday, December 07, 2015 11:21 AM *To:* Ted Yu *Cc:* Shushant Arora; user@spark.apache.org <mailto:

[jira] [Resolved] (SPARK-11676) Parquet filter tests all pass if filters are not really pushed down

2015-12-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11676. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 9659 [https

Re: parquet file doubts

2015-12-06 Thread Cheng Lian
cc parquet-dev list (it would be nice to always do so for these general questions.) Cheng On 12/6/15 3:10 PM, Shushant Arora wrote: Hi I have few doubts on parquet file format. 1.Does parquet keeps min max statistics like in ORC. how can I see parquet version(whether its1.1,1.2or1.3) for

Re: parquet file doubts

2015-12-06 Thread Cheng Lian
, Ted Yu wrote: Cheng: I only see user@spark in the CC. FYI On Sun, Dec 6, 2015 at 8:01 PM, Cheng Lian <l...@databricks.com <mailto:l...@databricks.com>> wrote: cc parquet-dev list (it would be nice to always do so for these general questions.) Cheng On 12/

[jira] [Created] (SPARK-12131) Cannot create ExpressionEncoder for Array[T] where T is a nested class

2015-12-03 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12131: -- Summary: Cannot create ExpressionEncoder for Array[T] where T is a nested class Key: SPARK-12131 URL: https://issues.apache.org/jira/browse/SPARK-12131 Project: Spark

[jira] [Created] (SPARK-12094) Better format for query plan tree string

2015-12-02 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12094: -- Summary: Better format for query plan tree string Key: SPARK-12094 URL: https://issues.apache.org/jira/browse/SPARK-12094 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-12094) Better format for query plan tree string

2015-12-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12094: --- Description: When examine plans of complex queries with multiple joins, a pain point of mine

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Cheng Lian
You may try to set Hadoop conf "parquet.enable.summary-metadata" to false to disable writing Parquet summary files (_metadata and _common_metadata). By default Parquet writes the summary files by collecting footers of all part-files in the dataset while committing the job. Spark also follows

[jira] [Created] (PARQUET-398) Testing JIRA ticket for testing committership

2015-12-02 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-398: -- Summary: Testing JIRA ticket for testing committership Key: PARQUET-398 URL: https://issues.apache.org/jira/browse/PARQUET-398 Project: Parquet Issue Type: Test

[jira] [Created] (SPARK-12046) Visibility and format issues in ScalaDoc/JavaDoc for branch-1.6

2015-11-30 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12046: -- Summary: Visibility and format issues in ScalaDoc/JavaDoc for branch-1.6 Key: SPARK-12046 URL: https://issues.apache.org/jira/browse/SPARK-12046 Project: Spark

[jira] [Created] (SPARK-12047) Unhelpful error messages generated by JavaDoc while doing sbt unidoc

2015-11-30 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12047: -- Summary: Unhelpful error messages generated by JavaDoc while doing sbt unidoc Key: SPARK-12047 URL: https://issues.apache.org/jira/browse/SPARK-12047 Project: Spark

[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source

2015-11-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031526#comment-15031526 ] Cheng Lian commented on SPARK-9182: --- Converting all {{expressions.Filter}} to {{sources.Filter

Re: DateTime Support - Hive Parquet

2015-11-29 Thread Cheng Lian
for this case? Do you convert on insert or on RDD to DF conversion? Regards, Bryan Jeffrey Sent from Outlook Mail *From: *Cheng Lian *Sent: *Tuesday, November 24, 2015 6:49 AM *To: *Bryan;user *Subject: *Re: DateTime Support - Hive Parquet I see, then this is actually irrelevant to Parquet. I

Re: Parquet files not getting coalesced to smaller number of files

2015-11-29 Thread Cheng Lian
RDD.coalesce(n) returns a new RDD rather than modifying the original RDD. So what you need is: metricsToBeSaved.coalesce(1500).saveAsNewAPIHadoopFile(...) Cheng On 11/29/15 12:21 PM, SRK wrote: Hi, I have the following code that saves the parquet files in my hourly batch to hdfs. My

[jira] [Commented] (SPARK-12000) `sbt publishLocal` hits a Scala compiler bug caused by `Since` annotation

2015-11-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030453#comment-15030453 ] Cheng Lian commented on SPARK-12000: Yeah, also found that the error message IS harmful. By switching

[jira] [Commented] (SPARK-12000) `sbt publishLocal` hits a Scala compiler bug caused by `Since` annotation

2015-11-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030428#comment-15030428 ] Cheng Lian commented on SPARK-12000: Hit exactly the same issue and worked it around by switching

[jira] [Created] (SPARK-12012) Show more comprehensive PhysicalRDD metadata when visualizing SQL query plan

2015-11-26 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12012: -- Summary: Show more comprehensive PhysicalRDD metadata when visualizing SQL query plan Key: SPARK-12012 URL: https://issues.apache.org/jira/browse/SPARK-12012 Project

[jira] [Updated] (SPARK-12012) Show more comprehensive PhysicalRDD metadata when visualizing SQL query plan

2015-11-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12012: --- Description: Currently a {{PhysicalRDD}} operator is just visualized as a node with nothing

[jira] [Resolved] (SPARK-11043) Hive Thrift Server will log warn "Couldn't find log associated with operation handle"

2015-11-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11043. Resolution: Fixed Fix Version/s: 1.6.0 1.7.0 Issue resolved by pull

[jira] [Updated] (SPARK-11043) Hive Thrift Server will log warn "Couldn't find log associated with operation handle"

2015-11-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11043: --- Assignee: SaintBacchus > Hive Thrift Server will log warn "Couldn't find log as

Re: DateTime Support - Hive Parquet

2015-11-24 Thread Cheng Lian
(to nanos, Timestamp, etc) prior to writing records to hive. Regards, Bryan Jeffrey Sent from Outlook Mail *From: *Cheng Lian *Sent: *Tuesday, November 24, 2015 1:42 AM *To: *Bryan Jeffrey;user *Subject: *Re: DateTime Support - Hive Parquet Hey Bryan, What do you mean by "DateTime prope

[jira] [Updated] (SPARK-9141) DataFrame recomputed instead of using cached parent.

2015-11-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-9141: -- Description: As I understand, DataFrame.cache() is supposed to work the same as RDD.cache(), so

Re: [ANNOUNCE] New Parquet committer: Cheng Lian

2015-11-23 Thread Cheng Lian
It's an honor, thanks for the trust! :) Cheng On 11/24/15 2:12 PM, Julien Le Dem wrote: Welcome Cheng On Mon, Nov 23, 2015 at 5:51 PM, Ryan Blue <b...@cloudera.com> wrote: On behalf of the Parquet PMC, I'm happy to announce that Cheng Lian has been invited to be a committer on the p

Re: DateTime Support - Hive Parquet

2015-11-23 Thread Cheng Lian
Hey Bryan, What do you mean by "DateTime properties"? Hive and Spark SQL both support DATE and TIMESTAMP types, but there's no DATETIME type. So I assume you are referring to Java class DateTime (possibly the one in joda)? Could you please provide a sample snippet that illustrates your

[jira] [Commented] (SPARK-10367) Support Parquet logical type INTERVAL

2015-11-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013199#comment-15013199 ] Cheng Lian commented on SPARK-10367: Dropped. > Support Parquet logical type INTER

[jira] [Updated] (SPARK-10367) Support Parquet logical type INTERVAL

2015-11-19 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10367: --- Target Version/s: (was: 1.6.0) > Support Parquet logical type INTER

Re: dounbts on parquet

2015-11-19 Thread Cheng Lian
ns where this rdd will lend.Have you used multiple output formats in spark? On Fri, Nov 13, 2015 at 3:56 PM, Cheng Lian <lian.cs@gmail.com <mailto:lian.cs@gmail.com>> wrote: Oh I see. Then parquet-avro should probably be more useful. AFAIK, parquet-hive is only

[jira] [Updated] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata

2015-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-9686: -- Priority: Blocker (was: Major) > Spark Thrift server doesn't return correct JDBC metad

[jira] [Updated] (SPARK-11785) When deployed against remote Hive metastore with lower versions, JDBC metadata calls throws exception

2015-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11785: --- Priority: Blocker (was: Major) > When deployed against remote Hive metastore with lower versi

[jira] [Assigned] (SPARK-11783) When deployed against remote Hive metastore, HiveContext.executionHive points to wrong metastore

2015-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned SPARK-11783: -- Assignee: Cheng Lian > When deployed against remote Hive metastore, HiveContext.executionH

[jira] [Updated] (SPARK-11785) When deployed against remote Hive metastore with lower versions, JDBC metadata calls throws exception

2015-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11785: --- Assignee: Cheng Lian > When deployed against remote Hive metastore with lower versions, J

[jira] [Commented] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata

2015-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010792#comment-15010792 ] Cheng Lian commented on SPARK-9686: --- [~navis] Thanks for the information! I'll try you patch shortly

[jira] [Updated] (SPARK-11694) Parquet logical types are not being tested properly

2015-11-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11694: --- Fix Version/s: 1.7.0 > Parquet logical types are not being tested prope

[jira] [Commented] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2015-11-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008827#comment-15008827 ] Cheng Lian commented on SPARK-11153: You are right that Impala and Hive write timestamps as Parquet

[jira] [Created] (SPARK-11783) When deployed against remote Hive metastore, HiveContext.executionHive points to wrong metastore

2015-11-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-11783: -- Summary: When deployed against remote Hive metastore, HiveContext.executionHive points to wrong metastore Key: SPARK-11783 URL: https://issues.apache.org/jira/browse/SPARK-11783

[jira] [Commented] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store

2015-11-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009033#comment-15009033 ] Cheng Lian commented on SPARK-9686: --- Tested 1.7-SNAPSHOT ([fa13301|https://github.com/apache/spark

[jira] [Commented] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2015-11-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009057#comment-15009057 ] Cheng Lian commented on SPARK-11153: Yes please file a ticket. One complication here

[jira] [Commented] (SPARK-11783) When deployed against remote Hive metastore, HiveContext.executionHive points to wrong metastore

2015-11-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009098#comment-15009098 ] Cheng Lian commented on SPARK-11783: This [comment|https://issues.apache.org/jira/browse/SPARK-9686

[jira] [Updated] (SPARK-6776) Implement backwards-compatibility rules in Catalyst converters (which convert Parquet record to rows)

2015-11-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-6776: -- Fix Version/s: 1.5.0 > Implement backwards-compatibility rules in Catalyst converters (which conv

[jira] [Commented] (SPARK-6776) Implement backwards-compatibility rules in Catalyst converters (which convert Parquet record to rows)

2015-11-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009181#comment-15009181 ] Cheng Lian commented on SPARK-6776: --- It's 1.5.0, just added. > Implement backwards-compatibility ru

[jira] [Created] (SPARK-11785) When deployed against remote Hive metastore with lower versions, JDBC metadata calls throws exception

2015-11-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-11785: -- Summary: When deployed against remote Hive metastore with lower versions, JDBC metadata calls throws exception Key: SPARK-11785 URL: https://issues.apache.org/jira/browse/SPARK-11785

Re: Release Parquet 1.9.0

2015-11-17 Thread Cheng Lian
It would be nice if we can also have a new parquet-format release and include it in parquet-mr 1.9.0, so that the SLF4J dependency issue can be fixed. In this way, user libraries/applications will be able to redirect Parquet logs using the JUL bridge provided by SLF4J.

[jira] [Resolved] (SPARK-11044) Parquet writer version fixed as version1

2015-11-16 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11044. Resolution: Fixed Fix Version/s: 1.7.0 Issue resolved by pull request 9060 [https

[jira] [Resolved] (SPARK-11692) Support for Parquet logical types, JSON and BSON (embedded types)

2015-11-16 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11692. Resolution: Fixed Fix Version/s: 1.7.0 Issue resolved by pull request 9658 [https

[jira] [Updated] (SPARK-11044) Parquet writer version fixed as version1

2015-11-16 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11044: --- Assignee: Hyukjin Kwon > Parquet writer version fixed as versi

[jira] [Updated] (SPARK-11692) Support for Parquet logical types, JSON and BSON (embedded types)

2015-11-16 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11692: --- Assignee: Hyukjin Kwon > Support for Parquet logical types, JSON and BSON (embedded ty

[jira] [Updated] (SPARK-11044) Parquet writer version fixed as version1

2015-11-16 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11044: --- Fix Version/s: 1.6.0 > Parquet writer version fixed as versi

[jira] [Comment Edited] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2015-11-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005023#comment-15005023 ] Cheng Lian edited comment on SPARK-11153 at 11/15/15 11:43 AM: --- Good

[jira] [Resolved] (SPARK-11694) Parquet logical types are not being tested properly

2015-11-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11694. Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9660 [https

[jira] [Updated] (SPARK-11694) Parquet logical types are not being tested properly

2015-11-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11694: --- Assignee: Hyukjin Kwon > Parquet logical types are not being tested prope

[jira] [Commented] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2015-11-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005004#comment-15005004 ] Cheng Lian commented on SPARK-11153: Yes. > Turns off Parquet filter push-down for string and bin

[jira] [Issue Comment Deleted] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2015-11-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-11153: --- Comment: was deleted (was: Yes.) > Turns off Parquet filter push-down for string and binary colu

[jira] [Commented] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2015-11-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005005#comment-15005005 ] Cheng Lian commented on SPARK-11153: Yes. > Turns off Parquet filter push-down for string and bin

[jira] [Commented] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2015-11-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005023#comment-15005023 ] Cheng Lian commented on SPARK-11153: Good question. We tried, see [PR #9225|https://github.com

[jira] [Resolved] (SPARK-11678) Partition discovery fail if there is a _SUCCESS file in the table's root dir

2015-11-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11678. Resolution: Fixed Fix Version/s: 1.6.0 1.7.0 Issue resolved by pull

[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell

2015-11-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001885#comment-15001885 ] Cheng Lian commented on SPARK-5968: --- As explained in the JIRA description, this issue shouldn't affect

[jira] [Resolved] (SPARK-11661) We should still pushdown filters returned by a data source's unhandledFilters

2015-11-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11661. Resolution: Fixed Fix Version/s: 1.6.0 1.7.0 Issue resolved by pull

[jira] [Commented] (SPARK-11191) [1.5] Can't create UDF's using hive thrift service

2015-11-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003549#comment-15003549 ] Cheng Lian commented on SPARK-11191: Sorry that I wasn't clear enough in my previous reply. So

[jira] [Commented] (SPARK-11191) [1.5] Can't create UDF's using hive thrift service

2015-11-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002047#comment-15002047 ] Cheng Lian commented on SPARK-11191: This issue consists of two bugs. One of them is the ADD JAR

[jira] [Commented] (SPARK-11191) [1.5] Can't create UDF's using hive thrift service

2015-11-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003578#comment-15003578 ] Cheng Lian commented on SPARK-11191: What error message/exception stacktrace did you get when working

[jira] [Commented] (SPARK-11191) [1.5] Can't create UDF's using hive thrift service

2015-11-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002385#comment-15002385 ] Cheng Lian commented on SPARK-11191: Spark SQL hasn't supported persisted functions yet. > [

[jira] [Resolved] (SPARK-11500) Not deterministic order of columns when using merging schemas.

2015-11-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-11500. Resolution: Fixed Fix Version/s: 1.7.0 Issue resolved by pull request 9517 [https

<    4   5   6   7   8   9   10   11   12   13   >