[GitHub] spark pull request: Problem select empty ORC table

2016-05-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13103#issuecomment-219187010 I think it would be good if you follow https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13113 [SPARK-15325][SQL] Replace the usage of deprecated DataSet API in tests (Scala/Java) ## What changes were proposed in this pull request? It seems `unionAll(other: Dataset[T

[GitHub] spark pull request: Fix reading of partitioned format=text dataset...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13104#issuecomment-219207342 Oh, I just meant it changes codes to support partitioned table for text data source which seems disabled in Spark 2.0. It seems the guide says it does not a JIRA

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13112#discussion_r63272945 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/stopwatches.scala --- @@ -19,7 +19,8 @@ package org.apache.spark.ml.util import

[GitHub] spark pull request: [SPARK-15323] Fix reading of partitioned forma...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13104#issuecomment-219207979 Let me cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13165#discussion_r63667236 --- Diff: R/pkg/R/client.R --- @@ -43,6 +43,17 @@ determineSparkSubmitBin <- function() { sparkSubmitBinName } +# R supports b

[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...

2016-05-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220009767 I see. Thanks! I will change them tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15323][SQL] Fix reading of partitioned ...

2016-05-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13104#issuecomment-219969148 @jurriaan It might be nicer if the title is `[SPARK-15323][SPARK-14463][SQL] ...` if this fixes the issue as well (So that both can be closed as you might already

[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-219955778 Please let me cc @sun-rui and @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-18 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13165 [SPARK-8603][SPARKR] Incorrect file separator passed to Java and Scripts from R in windows ## What changes were proposed in this pull request? This PR corrects R file separator

[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Use SparkSession in exa...

2016-05-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13164#issuecomment-219956200 Oh, I didn't even know actually there are some more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-15264][SQL] CSV Reader Error on Blank C...

2016-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13041#discussion_r62793802 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala --- @@ -61,7 +61,9 @@ class DefaultSource extends

[GitHub] spark pull request: [SPARK-15267][SQL] Refactor and add some class...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63117779 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [TRIVIAL][Doc] SparkSession class doc example ...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13086#issuecomment-218922187 (Just to make sure, it seems it is only single one across Spark codes) ```bash grep -r "SparkSession.builder()" . | grep ".scala"

[GitHub] spark pull request: [SPARK-15267][SQL] Refactor and add some class...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13048#discussion_r63120588 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4131] [SQL] Support INSERT OVERWRITE [L...

2016-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13067#discussion_r62963753 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoDir.scala --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4131] [SQL] Support INSERT OVERWRITE [L...

2016-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13067#discussion_r62963841 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoDir.scala --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4131] [SQL] Support INSERT OVERWRITE [L...

2016-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13067#discussion_r62963804 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoDir.scala --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-15315][SQL] Adding error check to the C...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13105#discussion_r63274194 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala --- @@ -172,4 +173,13 @@ class DefaultSource

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63274228 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -18,6 +18,9 @@ package org.apache.spark.sql import

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63274238 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63274249 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest

[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...

2016-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13115#issuecomment-219213787 (I think it would be nicer if the PR description is fill up.) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-216436047 @rxin Sure, I will thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-216435745 @rxin I thought so but I haven't tested yet. Could I will look into that if this one is merged maybe? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-14997][SQL] Fixed FileCatalog to return...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12856#issuecomment-216441995 (Maybe adding "Closes #12774" in the description?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-216445374 @rxin I also realised Python API is supporting properties as a dict having " arbitrary string tag/value", [here](https://github.com/apache/

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-216465994 @rxin Sure I will add more explicit description and some tests for this. Thanks. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12855#discussion_r61863832 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala --- @@ -239,48 +239,50 @@ private[sql] class

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12855#discussion_r61863848 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala --- @@ -363,84 +365,87 @@ private[sql] class

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12855#discussion_r61863843 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala --- @@ -363,84 +365,87 @@ private[sql] class

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-216499340 @rxin I added some more commits for unit tests in `CSVInferSchemaSuite`. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-216491871 @rxin I could find the same issue in internal datasources. I just added the same logics and a test in `HadoopFsRelationTest `. --- If your project is set up

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-216704194 @rxin I see. Thank you. Let me fix this up and change the description as well with some rules for `LongType`, `DoubleType` and `DecimalType`. --- If your project

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216733974 Hi @falaki, could you take a quick look? it won't be too long! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216734653 Please allow me cc you, @jbax, who I guess the author of Univocity parser. Could you please confirm that `Format.setComment()` is not affected if we only calls

[GitHub] spark pull request: [SPARK-14914] Fix Resource not closed after us...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12693#discussion_r61987018 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala --- @@ -640,12 +640,14 @@ class CheckpointSuite extends

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216740931 @jbax Cool! Thank you for detailed explanation. So, this uses OS default newline without `setLineSeparator()`, which is trimmed [here](https://github.com

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216743678 @jbax Ah, I guess `foo` and `bar` are separate rows, right? `stripLineEnd` will be applied for each row. If I got you wrong and `setLineSeparator

[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10943#issuecomment-217461632 ping @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10655#issuecomment-217465276 ping @RussellSpitzer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10953#issuecomment-217477017 @markgrover Mind adding `Closes #10681` in the PR description so that merging script can close that together? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12951#discussion_r62329769 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala --- @@ -90,10 +92,11 @@ private[spark] class FairSchedulableBuilder

[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12951#discussion_r62330358 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Pool.scala --- @@ -21,6 +21,7 @@ import java.util.concurrent.{ConcurrentHashMap

[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12951#discussion_r62330957 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Pool.scala --- @@ -47,6 +49,15 @@ private[spark] class Pool( var name = poolName

[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12951#discussion_r62330929 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -98,6 +98,14 @@ private[spark] class TaskSetManager( var

[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12951#discussion_r62331312 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Pool.scala --- @@ -21,6 +21,7 @@ import java.util.concurrent.{ConcurrentHashMap

[GitHub] spark pull request: [SPARK-15074][Shuffle] Cache shuffle index fil...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12944#discussion_r62331757 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ShuffleIndexRecord.java --- @@ -0,0 +1,39 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...

2016-05-06 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12972 [SPARK-15198][SQL] Support for pushing down filters for boolean types in ORC data source ## What changes were proposed in this pull request? This PR adds the support for pushing

[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12972#issuecomment-217601711 Let me please cc @liancheng and also @tedyu who suggested this change. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12639] [SQL] Mark Filters Fully Handled...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11317#issuecomment-217602140 @RussellSpitzer I saw you answered my ping before. Excuse my ping here again. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12904#issuecomment-217604986 @rxin @sureshthalamati Do you mind holding off this change until #12921 is merged? That PR also handles `nullValue`. Apparently, I guess `nullValue` could affect

[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r62411095 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -555,4 +558,37 @@ class CSVSuite extends

[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12904#issuecomment-217605160 Here is what I think CSV datasource should handle `""`, empty string and `nullValue`. With the option, `nullValue` set to `"

[GitHub] spark pull request: [SPARK-15125][SQL] Changing CSV data source ma...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12904#issuecomment-217605838 In case of writing, I think ``` Row("", "null", null) ``` should produce the CSV as below: 1. With the opt

[GitHub] spark pull request: [SPARK-14814][MLlib] API: Java compatibility, ...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12971#discussion_r62411488 --- Diff: mllib/src/test/java/org/apache/spark/mllib/tree/JavaDecisionTreeSuite.java --- @@ -21,6 +21,8 @@ import java.util.HashMap; import

[GitHub] spark pull request: [SPARK-13425][SQL] Documentation for CSV datas...

2016-05-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12817#issuecomment-216084433 @rxin I am so sorry, I think I totally misunderstood your initial comments before. I just addressed your comments later. Thank you. --- If your project is set up

[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-05-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12629#issuecomment-216086878 Hi @davies @viirya , If you are not sure of handling `null`s, I can close this for meanwhile. But, this PR includes - adding `OrcOptions` just like

[GitHub] spark pull request: [SPARK-14997]Files in subdirectories are incor...

2016-05-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12774#issuecomment-216098660 @gatorsmile Does that maybe imply closing this for now and make a JIRA or send a email to dev-mailing list in order to discuss this further? --- If your project

[GitHub] spark pull request: [SPARK-13425][SQL] Documentation for CSV datas...

2016-05-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12817#issuecomment-216090207 Sure. Thank you. Do you want me to remove Python documentation here? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-15050][SQL] Put CSV options as Python c...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12834#issuecomment-216112173 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-15050][SQL] Put CSV options as Python c...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12834#issuecomment-216112196 (@rxin Do you want me to do this for `json()` as well?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-15050][SQL] Put CSV options as Python c...

2016-05-02 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12834 [SPARK-15050][SQL] Put CSV options as Python csv function parameters ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-15050

[GitHub] spark pull request: [SPARK-13425][SQL] Documentation for CSV datas...

2016-05-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12817#issuecomment-216090703 @rxin I see. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14800][SQL] Dealing with null as a valu...

2016-05-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12629#issuecomment-216086947 cc @rxin as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15050][SQL] Put CSV and JSON options as...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12834#discussion_r61716763 --- Diff: python/pyspark/sql/readwriter.py --- @@ -177,31 +180,35 @@ def json(self, path, schema=None): :param path: string represents path

[GitHub] spark pull request: [SPARK-15050][SQL] Put CSV and JSON options as...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12834#discussion_r61708031 --- Diff: python/pyspark/sql/readwriter.py --- @@ -274,48 +274,44 @@ def text(self, paths): return self._df(self._jreader.text(self

[GitHub] spark pull request: [MINOR][DOC] Fix wrong type in python examples

2016-05-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12868#issuecomment-216848307 I think we might better change the title and description. From your comments, I guess it is not a wrong type but dependent on Python's version. So, I think

[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12921#discussion_r62159750 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -192,59 +192,59 @@ private[csv] object

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12920#issuecomment-217105494 I am not sure if fixing examples can have the component `[DOC]` in the title. I saw `[EXAMPLE]` component was used by @dongjoon-hyun. This is a pretty minor

[GitHub] spark pull request: [SPARK-15149][DOC] include python example for ...

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12925#issuecomment-217105714 @dongjoon-hyun This one as well. Do you mind if I ask your thoughts on the component in the title? Making good examples for PRs will help all other contributers

[GitHub] spark pull request: [SPARK-15125][SQL] New option to the CSV data ...

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12904#issuecomment-217103186 +1 for treating them as empty strings I guess this will be conflicted with https://github.com/apache/spark/pull/12921 because that PR deals with some bug

[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12921#discussion_r62164414 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala --- @@ -73,10 +73,10 @@ class CSVTypeCastSuite

[GitHub] spark pull request: [SPARK-15150][DOC] Add python example for LDA

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12927#issuecomment-217105871 @dongjoon-hyun Sorry for cc a lot but it would be great if I can hear your thoughts. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...

2016-05-04 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12921 [SPARK-15143][SPARK-15144][SQL] Add CSV tests with HadoopFsRelationTest and support for nullValue for other types ## What changes were proposed in this pull request? Currently

[GitHub] spark pull request: [SPARK-15148][SQL] Upgrade Univocity library f...

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12923#issuecomment-217086289 cc @rxin and @jbax (who is the author of Univocity library and suggested this change). --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12921#discussion_r62151223 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -447,7 +446,7 @@ class CSVSuite extends

[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12921#issuecomment-217082924 cc @rxin @falaki --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15148][SQL] Upgrade Univocity library f...

2016-05-05 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12923 [SPARK-15148][SQL] Upgrade Univocity library from 2.0.2 to 2.1.0 ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-15148 Mainly

[GitHub] spark pull request: [SPARK-15148][SQL] Upgrade Univocity library f...

2016-05-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12923#issuecomment-217094506 Thanks, @holdenk! It is not urgent. I can do this in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-15085][Streaming][Kafka] Rename streami...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12946#issuecomment-217373085 (@koeninger it seems the last part of the title is truncated) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14962][SQL] Do not push down isnotnull/...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12777#issuecomment-217391825 Hi @yhuai Would you mind taking a look for this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-15245][SQL] Stream API throws an except...

2016-05-10 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/13021 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-15245][SQL] Stream API throws an except...

2016-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13021#issuecomment-218073322 I am closing this after talking with @srowen in the JIRA ticket. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-15245][SQL] Stream API throws an except...

2016-05-09 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13021 [SPARK-15245][SQL] Stream API throws an exception for non-directory path with incorrect message. ## What changes were proposed in this pull request? https://issues.apache.org/jira

[GitHub] spark pull request: [SPARK-12200][SQL] Add __contains__ implementa...

2016-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10194#issuecomment-218082133 Maybe @davies because I see most of codes are written by him --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-15087][CORE][SQL] Remove AccumulatorV2....

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12865#issuecomment-216527112 (This is a pretty minor but I think `cc @rxin` can be removed but in the comments because the PR description explains the PR itself and the names of reviewers

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216772060 Thank you @jbax. I will try to do so after checking If I can identify any useful changes with Spark. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [MINOR][DOC] Fix wrong type in mllib.gaussian_...

2016-05-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12868#discussion_r61996965 --- Diff: examples/src/main/python/mllib/gaussian_mixture_model.py --- @@ -49,7 +49,7 @@ def parseVector(line): parser.add_argument

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216764368 @rxin in terms of funtionalities and performance, No. But it shortens codes and I thought it is confusing whether `comment` option in `CSVOptions` affects

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216765450 @rxin If it is too minor to merge, I can close and then do this in another PR maybe after investigating the newline stuff discussed above. --- If your project

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-04 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/12818 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216765934 Closing this. I will bring up this again maybe in https://github.com/apache/spark/pull/12268. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216748780 Oh, I misunderstood your first comment.I think I should not take out `setLineSeparator()` here but maybe I should open another issue ticket to set

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216750176 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216763118 @rxin To cut it short, I got a confirm, from the original author of Univocity, `setComment()` has no effect as long as Spark does not write comments from

[GitHub] spark pull request: [SPARK-15050][SQL] Put CSV and JSON options as...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12834#discussion_r61745877 --- Diff: python/pyspark/sql/readwriter.py --- @@ -258,64 +283,73 @@ def parquet(self, *paths): @ignore_unicode_prefix @since(1.6

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-02 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12855 [SPARK-10216][SQL] Avoid creating empty files during overwrite into Hive table with group by query ## What changes were proposed in this pull request? Currently, `INSERT

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-216411891 I submitted this PR because #8411 looks abandoned and looks the author is not answering from the last comment by a commiter. (It has been inactive almost halt

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12855#issuecomment-216411956 @yhuai Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

<    5   6   7   8   9   10   11   12   13   14   >