[jira] [Commented] (SPARK-16303) Update SQL examples and programming guide

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362575#comment-15362575 ] Cheng Lian commented on SPARK-16303: Sure, thanks for volunteering! Actually, I've started working

[jira] [Updated] (SPARK-16360) Speed up SQL query performance by removing redundant `executePlan` call in `Dataset`

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16360: --- Assignee: Dongjoon Hyun > Speed up SQL query performance by removing redundant `executePlan` c

[jira] [Resolved] (SPARK-16360) Speed up SQL query performance by removing redundant `executePlan` call in `Dataset`

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16360. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14044 [https

[jira] [Resolved] (SPARK-15198) Support for filter push down for boolean types in ORC

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15198. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 12972 [https

[jira] [Updated] (SPARK-15198) Support for filter push down for boolean types in ORC

2016-07-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15198: --- Assignee: Hyukjin Kwon > Support for filter push down for boolean types in

[jira] [Updated] (PARQUET-651) Parquet-avro fails to decode array of record with a single field name "element" correctly

2016-07-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-651: --- Description: Found this issue while investigating SPARK-16344. For the following Parquet schema

[jira] [Updated] (SPARK-16208) Add `PropagateEmptyRelation` optimizer

2016-07-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16208: --- Assignee: Dongjoon Hyun (was: Apache Spark) > Add `PropagateEmptyRelation` optimi

[jira] [Resolved] (SPARK-16208) Add `PropagateEmptyRelation` optimizer

2016-07-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16208. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 13906 [https

[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat

2016-07-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358975#comment-15358975 ] Cheng Lian commented on SPARK-16317: The motivation is to filter out input data files so

[jira] [Updated] (PARQUET-651) Parquet-avro fails to decode array of record with a single field name "element" correctly

2016-07-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-651: --- Description: Found this issue while investigating SPARK-16344. For the following Parquet schema

[jira] [Updated] (SPARK-16344) Array of struct with a single field name "element" can't be decoded from Parquet files written by Spark 1.6+

2016-07-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16344: --- Description: This is a weird corner case. Users may hit this issue if they have a schema that # has

[jira] [Updated] (SPARK-16344) Array of struct with a single field name "element" can't be decoded from Parquet files written by Spark 1.6+

2016-07-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16344: --- Description: Array of struct with a single field name "element" can't be decoded from Par

[jira] [Created] (SPARK-16344) Array of struct with a single field name "element" can't be decoded from Parquet files written by Spark 1.6+

2016-07-01 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16344: -- Summary: Array of struct with a single field name "element" can't be decoded from Parquet files written by Spark 1.6+ Key: SPARK-16344 URL: https://issues.apache.org/j

[jira] [Updated] (SPARK-15820) Add Catalog.refreshTable into python API

2016-06-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15820: --- Fix Version/s: (was: 2.0.0) 2.1.0 2.0.1 >

[jira] [Updated] (SPARK-15820) Add Catalog.refreshTable into python API

2016-06-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15820: --- Assignee: Weichen Xu > Add Catalog.refreshTable into python

[jira] [Resolved] (SPARK-15820) Add Catalog.refreshTable into python API

2016-06-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15820. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13558 [https

[jira] [Created] (SPARK-16317) Add file filtering interface for FileFormat

2016-06-30 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16317: -- Summary: Add file filtering interface for FileFormat Key: SPARK-16317 URL: https://issues.apache.org/jira/browse/SPARK-16317 Project: Spark Issue Type

[jira] [Resolved] (SPARK-16134) optimizer rules for typed filter

2016-06-29 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16134. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 13846 [https

[jira] [Created] (SPARK-16303) Update SQL examples and programming guide

2016-06-29 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16303: -- Summary: Update SQL examples and programming guide Key: SPARK-16303 URL: https://issues.apache.org/jira/browse/SPARK-16303 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-16295) Extract SQL programming guide example snippets from source files instead of hard code them

2016-06-29 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16295: --- Description: Currently, all example snippets in the SQL programming guide are hard-coded, which can

[jira] [Updated] (SPARK-16294) Labelling support for the include_example Jekyll plugin

2016-06-29 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16294: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-16295 > Labelling supp

[jira] [Created] (SPARK-16295) Extract SQL programming guide example snippets from source files instead of hard code them

2016-06-29 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16295: -- Summary: Extract SQL programming guide example snippets from source files instead of hard code them Key: SPARK-16295 URL: https://issues.apache.org/jira/browse/SPARK-16295

[jira] [Updated] (SPARK-16294) Labelling support for the include_example Jekyll plugin

2016-06-29 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16294: --- Description: Part of the Spark programming guide pages are using the {{include_example}} Jekyll

[jira] [Updated] (SPARK-16294) Labelling support for the include_example Jekyll plugin

2016-06-29 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16294: --- Description: Part of the Spark programming guide pages are using the {{include_example}} Jekyll

[jira] [Created] (SPARK-16294) Labelling support for the include_example Jekyll plugin

2016-06-29 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16294: -- Summary: Labelling support for the include_example Jekyll plugin Key: SPARK-16294 URL: https://issues.apache.org/jira/browse/SPARK-16294 Project: Spark Issue

[jira] [Created] (SPARK-16291) Invalid aggregate functions like MAX(COUNT(*)) are not captured by CheckAnalysis

2016-06-29 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16291: -- Summary: Invalid aggregate functions like MAX(COUNT(*)) are not captured by CheckAnalysis Key: SPARK-16291 URL: https://issues.apache.org/jira/browse/SPARK-16291 Project

[jira] [Resolved] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-28 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16100. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Resolved] (SPARK-16221) Redirect Parquet JUL logger via SLF4J for WRITE operations

2016-06-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16221. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Updated] (SPARK-16221) Redirect Parquet JUL logger via SLF4J for WRITE operations

2016-06-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16221: --- Assignee: Dongjoon Hyun > Redirect Parquet JUL logger via SLF4J for WRITE operati

[jira] [Commented] (SPARK-16164) CombineFilters should keep the ordering in the logical plan

2016-06-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351244#comment-15351244 ] Cheng Lian commented on SPARK-16164: I'm not saying that we should make this explicit, but our

[jira] [Updated] (SPARK-10591) False negative in QueryTest.checkAnswer

2016-06-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10591: --- Fix Version/s: (was: 2.0.0) 2.1.0 2.0.1 > False negat

[jira] [Updated] (SPARK-10591) False negative in QueryTest.checkAnswer

2016-06-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10591: --- Assignee: Dongjoon Hyun > False negative in QueryTest.checkAns

[jira] [Resolved] (SPARK-10591) False negative in QueryTest.checkAnswer

2016-06-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10591. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13913 [https

[jira] [Commented] (SPARK-16164) CombineFilters should keep the ordering in the logical plan

2016-06-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347701#comment-15347701 ] Cheng Lian commented on SPARK-16164: I'm posting a summary of our offline and GitHub discussion about

[jira] [Resolved] (SPARK-16165) Fix the update logic for InMemoryTableScanExec.readBatches accumulator

2016-06-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16165. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Updated] (SPARK-16165) Fix the update logic for InMemoryTableScanExec.readBatches accumulator

2016-06-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16165: --- Assignee: Dongjoon Hyun > Fix the update logic for InMemoryTableScanExec.readBatches accumula

[jira] [Updated] (SPARK-13709) Spark unable to decode Avro when partitioned

2016-06-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-13709: --- Assignee: Cheng Lian > Spark unable to decode Avro when partitio

[jira] [Updated] (SPARK-13572) HiveContext reads avro Hive tables incorrectly

2016-06-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-13572: --- Description: I am using PySpark to read avro-based tables from Hive and while the avro tables can

[jira] [Updated] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16100: --- Description: I get a similar error when using complex types in Aggregator. Not sure

[jira] [Updated] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16100: --- Assignee: Wenchen Fan > Aggregator fails with Tungsten error when complex types are used for resu

[jira] [Resolved] (SPARK-16097) Encoders.tuple should handle null object correctly

2016-06-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16097. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13807 [https

[jira] [Updated] (SPARK-16121) ListingFileCatalog does not list in parallel anymore

2016-06-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16121: --- Assignee: Yin Huai > ListingFileCatalog does not list in parallel anym

[jira] [Resolved] (SPARK-16121) ListingFileCatalog does not list in parallel anymore

2016-06-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16121. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13830 [https

[jira] [Commented] (SPARK-16032) Audit semantics of various insertion operations related to partitioned tables

2016-06-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341754#comment-15341754 ] Cheng Lian commented on SPARK-16032: [~rdblue], I also migrated some test cases from your PR so

[jira] [Commented] (SPARK-16032) Audit semantics of various insertion operations related to partitioned tables

2016-06-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341658#comment-15341658 ] Cheng Lian commented on SPARK-16032: Hey [~rdblue], [~yhuai] and [~cloud_fan] had already covered

[jira] [Resolved] (SPARK-15894) Add doc to control #partition for input files

2016-06-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15894. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13797 [https

[jira] [Updated] (SPARK-15894) Add doc to control #partition for input files

2016-06-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15894: --- Assignee: Takeshi Yamamuro > Add doc to control #partition for input fi

[jira] [Resolved] (SPARK-16030) Allow specifying static partitions in an INSERT statement for data source tables

2016-06-20 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16030. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13769 [https

Re: Spark 2.0 Dataset Documentation

2016-06-17 Thread Cheng Lian
. Should I take discussion to your PR? Pedro On Fri, Jun 17, 2016 at 11:12 PM, Cheng Lian <lian.cs@gmail.com <mailto:lian.cs@gmail.com>> wrote: Hey Pedro, SQL programming guide is being updated. Here's the PR, but not merged yet: https://github.com/apache/spar

Re: Spark 2.0 Dataset Documentation

2016-06-17 Thread Cheng Lian
Hey Pedro, SQL programming guide is being updated. Here's the PR, but not merged yet: https://github.com/apache/spark/pull/13592 Cheng On 6/17/16 9:13 PM, Pedro Rodriguez wrote: Hi All, At my workplace we are starting to use Datasets in 1.6.1 and even more with Spark 2.0 in place of

[jira] [Updated] (SPARK-16033) DataFrameWriter.partitionBy() can't be used together with DataFrameWriter.insertInto()

2016-06-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16033: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-16032 > DataFrameWriter.partitionBy() ca

[jira] [Created] (SPARK-16033) DataFrameWriter.partitionBy() can't be used together with DataFrameWriter.insertInto()

2016-06-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16033: -- Summary: DataFrameWriter.partitionBy() can't be used together with DataFrameWriter.insertInto() Key: SPARK-16033 URL: https://issues.apache.org/jira/browse/SPARK-16033

[jira] [Updated] (SPARK-16030) Allow specifying static partitions in an INSERT statement for data source tables

2016-06-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16030: --- Assignee: Yin Huai > Allow specifying static partitions in an INSERT statement for data sou

[jira] [Created] (SPARK-16032) Audit semantics of various insertion operations related to partitioned tables

2016-06-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16032: -- Summary: Audit semantics of various insertion operations related to partitioned tables Key: SPARK-16032 URL: https://issues.apache.org/jira/browse/SPARK-16032 Project

[jira] [Updated] (SPARK-16030) Allow specifying static partitions in an INSERT statement for data source tables

2016-06-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-16030: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-16032 > Allow specifying static partiti

[jira] [Updated] (SPARK-15916) JDBC AND/OR operator push down does not respect lower OR operator precedence

2016-06-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15916: --- Description: A table from SQL server Northwind database was registered as a JDBC dataframe. A query

[jira] [Updated] (SPARK-15916) JDBC AND/OR operator push down does not respect lower OR operator precedence

2016-06-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15916: --- Assignee: Hyukjin Kwon > JDBC AND/OR operator push down does not respect lower OR opera

[jira] [Resolved] (SPARK-15916) JDBC AND/OR operator push down does not respect lower OR operator precedence

2016-06-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15916. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13743 [https

Re: List of Additions to Parquet 2

2016-06-16 Thread Cheng Lian
One problem of Parquet user-defined key/value metadata is that, when merging footers of multiple Parquet files to generate the summary files, if two Parquet files have key/value entries with the same key but different values, Parquet doesn't know how to merge them, and simply throws an

[jira] [Resolved] (SPARK-15862) Better Error Message When Having Database Name in CACHE TABLE AS SELECT

2016-06-16 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15862. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13572 [https

[jira] [Resolved] (SPARK-15786) joinWith bytecode generation calling ByteBuffer.wrap with InternalRow

2016-06-16 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15786. Resolution: Duplicate Fix Version/s: 2.0.0 > joinWith bytecode generation call

[jira] [Updated] (SPARK-15786) joinWith bytecode generation calling ByteBuffer.wrap with InternalRow

2016-06-16 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15786: --- Assignee: Sean Zhong > joinWith bytecode generation calling ByteBuffer.wrap with Internal

[jira] [Updated] (SPARK-15983) Remove FileFormat.prepareRead()

2016-06-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15983: --- Summary: Remove FileFormat.prepareRead() (was: Remove FileFormat.prepareRead) > Rem

[jira] [Updated] (SPARK-15983) Remove FileFormat.prepareRead

2016-06-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15983: --- Description: Interface method {{FileFormat.prepareRead()}} was added in [PR #12088|https

[jira] [Created] (SPARK-15983) Remove FileFormat.prepareRead

2016-06-15 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-15983: -- Summary: Remove FileFormat.prepareRead Key: SPARK-15983 URL: https://issues.apache.org/jira/browse/SPARK-15983 Project: Spark Issue Type: Improvement

Re: Hive 1.0.0 not able to read Spark 1.6.1 parquet output files on EMR 4.7.0

2016-06-15 Thread Cheng Lian
Spark 1.6.1 is also using 1.7.0. Could you please share the schema of your Parquet file as well as the exact exception stack trace reported by Hive? Cheng On 6/13/16 12:56 AM, mayankshete wrote: Hello Team , I am facing an issue where output files generated by Spark 1.6.1 are not read by

[jira] [Updated] (SPARK-15901) Test Cases for CONVERT_METASTORE_ORC and CONVERT_METASTORE_PARQUET

2016-06-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15901: --- Assignee: Xiao Li > Test Cases for CONVERT_METASTORE_ORC and CONVERT_METASTORE_PARQ

[jira] [Resolved] (SPARK-15901) Test Cases for CONVERT_METASTORE_ORC and CONVERT_METASTORE_PARQUET

2016-06-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15901. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13622 [https

[jira] [Commented] (SPARK-14953) LocalBackend should revive offers periodically

2016-06-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332560#comment-15332560 ] Cheng Lian commented on SPARK-14953: Marked this as won't fix since this isn't causing any actual

[jira] [Resolved] (SPARK-14953) LocalBackend should revive offers periodically

2016-06-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-14953. Resolution: Won't Fix > LocalBackend should revive offers periodica

Re: update mysql in spark

2016-06-15 Thread Cheng Lian
Spark SQL doesn't support update command yet. On Wed, Jun 15, 2016, 9:08 AM spR wrote: > hi, > > can we write a update query using sqlcontext? > > sqlContext.sql("update act1 set loc = round(loc,4)") > > what is wrong in this? I get the following error. > >

[jira] [Resolved] (SPARK-15929) DataFrameSuite path globbing error message tests are not fully portable

2016-06-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15929. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13649 [https

[jira] [Commented] (SPARK-15931) SparkR tests failing on R 3.3.0

2016-06-13 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328403#comment-15328403 ] Cheng Lian commented on SPARK-15931: cc [~mengxr] > SparkR tests failing on R 3.

[jira] [Created] (SPARK-15931) SparkR tests failing on R 3.3.0

2016-06-13 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-15931: -- Summary: SparkR tests failing on R 3.3.0 Key: SPARK-15931 URL: https://issues.apache.org/jira/browse/SPARK-15931 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-15925) Replaces registerTempTable with createOrReplaceTempView in SparkR

2016-06-13 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-15925: -- Summary: Replaces registerTempTable with createOrReplaceTempView in SparkR Key: SPARK-15925 URL: https://issues.apache.org/jira/browse/SPARK-15925 Project: Spark

[jira] [Reopened] (SPARK-15639) Try to push down filter at RowGroups level for parquet reader

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reopened SPARK-15639: We've decided to revert the merged PR, so reopening it. > Try to push down filter at RowGroups le

[jira] [Updated] (SPARK-15639) Try to push down filter at RowGroups level for parquet reader

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15639: --- Assignee: Liang-Chi Hsieh > Try to push down filter at RowGroups level for parquet rea

[jira] [Updated] (SPARK-15639) Try to push down filter at RowGroups level for parquet reader

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15639: --- Affects Version/s: 2.0.0 > Try to push down filter at RowGroups level for parquet rea

[jira] [Resolved] (SPARK-15639) Try to push down filter at RowGroups level for parquet reader

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15639. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13371 [https

[jira] [Resolved] (SPARK-15884) Override stringArgs method in MapPartitionsInR case class in order to avoid Out Of Mermory exceptions when calling toString

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15884. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13610 [https

[jira] [Updated] (SPARK-15884) Override stringArgs method in MapPartitionsInR case class in order to avoid Out Of Mermory exceptions when calling toString

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15884: --- Assignee: Narine Kokhlikyan > Override stringArgs method in MapPartitionsInR case class in or

[jira] [Updated] (SPARK-15862) Better Error Message When Having Database Name in CACHE TABLE AS SELECT

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15862: --- Assignee: Xiao Li > Better Error Message When Having Database Name in CACHE TABLE AS SEL

[jira] [Resolved] (SPARK-15753) Move some Analyzer stuff to Analyzer from DataFrameWriter

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15753. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13496 [https

[jira] [Updated] (SPARK-15856) Revert API breaking changes made in DataFrameReader.text and SQLContext.range

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15856: --- Assignee: Wenchen Fan > Revert API breaking changes made in DataFrameReader.t

[jira] [Updated] (SPARK-15753) Move some Analyzer stuff to Analyzer from DataFrameWriter

2016-06-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15753: --- Assignee: Liang-Chi Hsieh > Move some Analyzer stuff to Analyzer from DataFrameWri

[jira] [Created] (SPARK-15863) Update SQL programming guide for Spark 2.0

2016-06-09 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-15863: -- Summary: Update SQL programming guide for Spark 2.0 Key: SPARK-15863 URL: https://issues.apache.org/jira/browse/SPARK-15863 Project: Spark Issue Type

[jira] [Updated] (SPARK-15856) Revert API breaking changes made in DataFrameReader.text and SQLContext.range

2016-06-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15856: --- Description: In Spark 2.0, after unifying Datasets and DataFrames, we made two API breaking changes

[jira] [Updated] (SPARK-15856) Revert API breaking changes made in DataFrameReader.text and SQLContext.range

2016-06-09 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15856: --- Description: In Spark 2.0, after unifying Datasets and DataFrames, we made two API breaking changes

[jira] [Created] (SPARK-15856) Revert API breaking changes made in DataFrameReader.text and SQLContext.range

2016-06-09 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-15856: -- Summary: Revert API breaking changes made in DataFrameReader.text and SQLContext.range Key: SPARK-15856 URL: https://issues.apache.org/jira/browse/SPARK-15856 Project

[jira] [Resolved] (SPARK-15792) [SQL] Allows operator to change the verbosity in explain output.

2016-06-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15792. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13535 [https

[jira] [Resolved] (SPARK-15632) Dataset typed filter operation changes query plan schema

2016-06-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15632. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13529 [https

[jira] [Commented] (SPARK-15632) Dataset typed filter operation changes query plan schema

2016-06-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317460#comment-15317460 ] Cheng Lian commented on SPARK-15632: The {{.map(identity)}} example is quite interesting, thanks

[jira] [Updated] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-06-06 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15654: --- Assignee: Takeshi Yamamuro > Reading gzipped files results in duplicate r

[jira] [Resolved] (SPARK-15657) RowEncoder should validate the data type of input object

2016-06-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15657. Resolution: Fixed Fix Version/s: 2.0.0 Resolved by https://github.com/apache/spark/pull

[jira] [Updated] (SPARK-15632) Dataset typed filter operation changes query plan schema

2016-06-03 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15632: --- Assignee: Sean Zhong > Dataset typed filter operation changes query plan sch

[jira] [Commented] (SPARK-15140) encoder should make sure input object is not null

2016-06-03 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314853#comment-15314853 ] Cheng Lian commented on SPARK-15140: Issue resolved by pull request 13469 [https://github.com/apache

[jira] [Resolved] (SPARK-15140) encoder should make sure input object is not null

2016-06-03 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15140. Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > encoder sho

[jira] [Resolved] (SPARK-15547) Encoder validation is too strict for inner nested structs

2016-06-03 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15547. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13474 [https

[jira] [Resolved] (SPARK-15494) encoder code cleanup

2016-06-03 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15494. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13269 [https

[jira] [Resolved] (SPARK-14959) ​Problem Reading partitioned ORC or Parquet files

2016-06-02 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-14959. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13463 [https

<    1   2   3   4   5   6   7   8   9   10   >