[jira] [Created] (SPARK-20805) updated updateP in SVD++ is error

2017-05-18 Thread BoLing (JIRA)
BoLing created SPARK-20805: -- Summary: updated updateP in SVD++ is error Key: SPARK-20805 URL: https://issues.apache.org/jira/browse/SPARK-20805 Project: Spark Issue Type: Bug Components:

[jira] [Updated] (SPARK-20803) KernelDensity.estimate in pyspark.mllib.stat.KernelDensity throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not

2017-05-18 Thread Bettadapura Srinath Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bettadapura Srinath Sharma updated SPARK-20803: --- Description: When data is NOT normally distributed (correct

[jira] [Created] (SPARK-20804) Join with null safe equality fails with AnalysisException

2017-05-18 Thread koert kuipers (JIRA)
koert kuipers created SPARK-20804: - Summary: Join with null safe equality fails with AnalysisException Key: SPARK-20804 URL: https://issues.apache.org/jira/browse/SPARK-20804 Project: Spark

[jira] [Assigned] (SPARK-20801) Store accurate size of blocks in MapStatus when it's above threshold.

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20801: Assignee: Apache Spark > Store accurate size of blocks in MapStatus when it's above

[jira] [Updated] (SPARK-20801) Store accurate size of blocks in MapStatus when it's above threshold.

2017-05-18 Thread jin xing (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jin xing updated SPARK-20801: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-19659 > Store accurate size of blocks in

[jira] [Assigned] (SPARK-20801) Store accurate size of blocks in MapStatus when it's above threshold.

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20801: Assignee: (was: Apache Spark) > Store accurate size of blocks in MapStatus when it's

[jira] [Commented] (SPARK-20801) Store accurate size of blocks in MapStatus when it's above threshold.

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016813#comment-16016813 ] Apache Spark commented on SPARK-20801: -- User 'jinxing64' has created a pull request for this issue:

[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-05-18 Thread Ruslan Dautkhanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016664#comment-16016664 ] Ruslan Dautkhanov commented on SPARK-18838: --- my 2 cents. Would be nice to explore idea of

[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-05-18 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016663#comment-16016663 ] Josh Rosen commented on SPARK-18838: [~bOOmX], do you have CPU-time profiling within each of those

[jira] [Created] (SPARK-20803) KernelDensity.estimate in pyspark.mllib.stat.KernelDensity throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not

2017-05-18 Thread Bettadapura Srinath Sharma (JIRA)
Bettadapura Srinath Sharma created SPARK-20803: -- Summary: KernelDensity.estimate in pyspark.mllib.stat.KernelDensity throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not

[jira] [Commented] (SPARK-20784) Spark hangs (v2.0) or Futures timed out (v2.1) after a joinWith() and cache() in YARN client mode

2017-05-18 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016600#comment-16016600 ] Hyukjin Kwon commented on SPARK-20784: -- Thank you for checking this out. > Spark hangs (v2.0) or

[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-05-18 Thread Antoine PRANG (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016579#comment-16016579 ] Antoine PRANG commented on SPARK-18838: --- [~joshrosen][~sitalke...@gmail.com] I have measured the

[jira] [Commented] (SPARK-19076) Upgrade Hive dependence to Hive 2.x

2017-05-18 Thread William Handy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016309#comment-16016309 ] William Handy commented on SPARK-19076: --- It seems like it was decided that this was too difficult,

[jira] [Commented] (SPARK-20389) Upgrade kryo to fix NegativeArraySizeException

2017-05-18 Thread Louis Bergelson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016305#comment-16016305 ] Louis Bergelson commented on SPARK-20389: - What's the process for evaluating the effect on spark

[jira] [Updated] (SPARK-20663) Data missing after insert overwrite table partition which is created on specific location

2017-05-18 Thread kobefeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kobefeng updated SPARK-20663: - Affects Version/s: (was: 2.1.0) 2.1.1 > Data missing after insert overwrite

[jira] [Updated] (SPARK-20663) Data missing after insert overwrite table partition which is created on specific location

2017-05-18 Thread kobefeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kobefeng updated SPARK-20663: - Labels: (was: easyfix) > Data missing after insert overwrite table partition which is created on >

[jira] [Commented] (SPARK-16627) --jars doesn't work in Mesos mode

2017-05-18 Thread Michael Gummelt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016253#comment-16016253 ] Michael Gummelt commented on SPARK-16627: - I'm not completely sure, but I believe that the

[jira] [Created] (SPARK-20802) kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not nor

2017-05-18 Thread Bettadapura Srinath Sharma (JIRA)
Bettadapura Srinath Sharma created SPARK-20802: -- Summary: kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not normally

[jira] [Commented] (SPARK-12559) Cluster mode doesn't work with --packages

2017-05-18 Thread Michael Gummelt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016243#comment-16016243 ] Michael Gummelt commented on SPARK-12559: - I changed the title from "Standalone cluster mode" to

[jira] [Updated] (SPARK-12559) Cluster mode doesn't work with --packages

2017-05-18 Thread Michael Gummelt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-12559: Summary: Cluster mode doesn't work with --packages (was: Standalone cluster mode doesn't

[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-05-18 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016217#comment-16016217 ] Sital Kedia commented on SPARK-18838: - [~joshrosen] - >> Alternatively, we could use two queues, one

[jira] [Commented] (SPARK-20776) Fix JobProgressListener perf. problems caused by empty TaskMetrics initialization

2017-05-18 Thread Ruslan Dautkhanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016168#comment-16016168 ] Ruslan Dautkhanov commented on SPARK-20776: --- Thank you [~joshrosen]. Would it be possible to

[jira] [Resolved] (SPARK-20364) Parquet predicate pushdown on columns with dots return empty results

2017-05-18 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20364. - Resolution: Fixed Fix Version/s: 2.2.0 > Parquet predicate pushdown on columns with dots return

[jira] [Assigned] (SPARK-20364) Parquet predicate pushdown on columns with dots return empty results

2017-05-18 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-20364: --- Assignee: Hyukjin Kwon > Parquet predicate pushdown on columns with dots return empty results >

[jira] [Commented] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016116#comment-16016116 ] yuhao yang commented on SPARK-20768: Thanks for the ping. [~mlnick] We should just treat it as an

[jira] [Commented] (SPARK-20797) mllib lda's LocalLDAModel's save: out of memory.

2017-05-18 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016061#comment-16016061 ] yuhao yang commented on SPARK-20797: [~d0evi1] Thanks for reporting the issue and proposal for the

[jira] [Assigned] (SPARK-20796) the location of start-master.sh in spark-standalone.md is wrong

2017-05-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-20796: - Assignee: liuzhaokun > the location of start-master.sh in spark-standalone.md is wrong >

[jira] [Resolved] (SPARK-20796) the location of start-master.sh in spark-standalone.md is wrong

2017-05-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20796. --- Resolution: Fixed Fix Version/s: 2.1.2 2.2.0 Issue resolved by pull

[jira] [Resolved] (SPARK-20779) The ASF header placed in an incorrect location in some files

2017-05-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20779. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18012

[jira] [Assigned] (SPARK-20779) The ASF header placed in an incorrect location in some files

2017-05-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-20779: - Assignee: zuotingbing > The ASF header placed in an incorrect location in some files >

[jira] [Commented] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name

2017-05-18 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016021#comment-16016021 ] Mitesh commented on SPARK-17867: Ah I see, thanks [~viirya]. The repartitionByColumns is just a short-cut

[jira] [Commented] (SPARK-14492) Spark SQL 1.6.0 does not work with external Hive metastore version lower than 1.2.0; its not backwards compatible with earlier version

2017-05-18 Thread Brian Albright (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015964#comment-16015964 ] Brian Albright commented on SPARK-14492: Ran into this over the past week. Here are some gists

[jira] [Created] (SPARK-20801) Store accurate size of blocks in MapStatus when it's above threshold.

2017-05-18 Thread jin xing (JIRA)
jin xing created SPARK-20801: Summary: Store accurate size of blocks in MapStatus when it's above threshold. Key: SPARK-20801 URL: https://issues.apache.org/jira/browse/SPARK-20801 Project: Spark

[jira] [Commented] (SPARK-20796) the location of start-master.sh in spark-standalone.md is wrong

2017-05-18 Thread liuzhaokun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015914#comment-16015914 ] liuzhaokun commented on SPARK-20796: @Sean Owen I am so sorry about it.And I will take your advice.

[jira] [Closed] (SPARK-20784) Spark hangs (v2.0) or Futures timed out (v2.1) after a joinWith() and cache() in YARN client mode

2017-05-18 Thread Mathieu D (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D closed SPARK-20784. - Resolution: Not A Bug Oh boy, it was an OOM on the driver. Most of the times, it was silent. I just

[jira] [Commented] (SPARK-20782) Dataset's isCached operator

2017-05-18 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015872#comment-16015872 ] Wenchen Fan commented on SPARK-20782: - one alternative is to create a temp view and cache the view,

[jira] [Commented] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name

2017-05-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015852#comment-16015852 ] Liang-Chi Hsieh commented on SPARK-17867: - The above example code can't compile with current

[jira] [Updated] (SPARK-20775) from_json should also have an API where the schema is specified with a string

2017-05-18 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-20775: - Issue Type: Improvement (was: Bug) > from_json should also have an API where the schema

[jira] [Commented] (SPARK-20775) from_json should also have an API where the schema is specified with a string

2017-05-18 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015819#comment-16015819 ] Takeshi Yamamuro commented on SPARK-20775: -- Since I feel this is no a bug, I'll change the issue

[jira] [Updated] (SPARK-20797) mllib lda's LocalLDAModel's save: out of memory.

2017-05-18 Thread d0evi1 (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] d0evi1 updated SPARK-20797: --- Summary: mllib lda's LocalLDAModel's save: out of memory. (was: mllib lda load and save out of memory. )

[jira] [Updated] (SPARK-20797) mllib lda load and save out of memory.

2017-05-18 Thread d0evi1 (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] d0evi1 updated SPARK-20797: --- Description: when i try online lda model with large text data(nearly 1 billion chinese news' abstract), the

[jira] [Commented] (SPARK-20797) mllib lda load and save out of memory.

2017-05-18 Thread d0evi1 (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015813#comment-16015813 ] d0evi1 commented on SPARK-20797: sorry for my poor english. i rewrite the problem. just one topic:

[jira] [Commented] (SPARK-20782) Dataset's isCached operator

2017-05-18 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015810#comment-16015810 ] Takeshi Yamamuro commented on SPARK-20782: -- This short-cut makes some sense to me. WDYT? cc:

[jira] [Updated] (SPARK-20800) Allow users to set job group when connecting through the SQL thrift server

2017-05-18 Thread Tim Zeyl (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Zeyl updated SPARK-20800: - Description: It would be useful for users to be able to set the job group through thrift server clients

[jira] [Commented] (SPARK-20747) Distinct in Aggregate Functions

2017-05-18 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015776#comment-16015776 ] Takeshi Yamamuro commented on SPARK-20747: -- You mean this query below? {code} scala> Seq((1, 1),

[jira] [Created] (SPARK-20800) Allow users to set job group when connecting through the SQL thrift server

2017-05-18 Thread Tim Zeyl (JIRA)
Tim Zeyl created SPARK-20800: Summary: Allow users to set job group when connecting through the SQL thrift server Key: SPARK-20800 URL: https://issues.apache.org/jira/browse/SPARK-20800 Project: Spark

[jira] [Comment Edited] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name

2017-05-18 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015772#comment-16015772 ] Mitesh edited comment on SPARK-17867 at 5/18/17 1:48 PM: - I'm seeing a regression

[jira] [Comment Edited] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name

2017-05-18 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015772#comment-16015772 ] Mitesh edited comment on SPARK-17867 at 5/18/17 1:48 PM: - I'm seeing a regression

[jira] [Comment Edited] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name

2017-05-18 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015772#comment-16015772 ] Mitesh edited comment on SPARK-17867 at 5/18/17 1:48 PM: - I'm seeing a regression

[jira] [Comment Edited] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name

2017-05-18 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015772#comment-16015772 ] Mitesh edited comment on SPARK-17867 at 5/18/17 1:49 PM: - I'm seeing a regression

[jira] [Commented] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name

2017-05-18 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015772#comment-16015772 ] Mitesh commented on SPARK-17867: I'm seeing a regression from this change, the last filter gets pushed

[jira] [Comment Edited] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name

2017-05-18 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015772#comment-16015772 ] Mitesh edited comment on SPARK-17867 at 5/18/17 1:47 PM: - I'm seeing a regression

[jira] [Updated] (SPARK-20798) GenerateUnsafeProjection should check if value is null before calling the getter

2017-05-18 Thread Ala Luszczak (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ala Luszczak updated SPARK-20798: - Description: GenerateUnsafeProjection.writeStructToBuffer() does not honor the assumption that

[jira] [Assigned] (SPARK-20798) GenerateUnsafeProjection should check if value is null before calling the getter

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20798: Assignee: Apache Spark > GenerateUnsafeProjection should check if value is null before

[jira] [Assigned] (SPARK-20798) GenerateUnsafeProjection should check if value is null before calling the getter

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20798: Assignee: (was: Apache Spark) > GenerateUnsafeProjection should check if value is

[jira] [Commented] (SPARK-20798) GenerateUnsafeProjection should check if value is null before calling the getter

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015754#comment-16015754 ] Apache Spark commented on SPARK-20798: -- User 'ala' has created a pull request for this issue:

[jira] [Created] (SPARK-20799) Unable to infer schema for ORC on reading ORC from S3

2017-05-18 Thread Jork Zijlstra (JIRA)
Jork Zijlstra created SPARK-20799: - Summary: Unable to infer schema for ORC on reading ORC from S3 Key: SPARK-20799 URL: https://issues.apache.org/jira/browse/SPARK-20799 Project: Spark

[jira] [Assigned] (SPARK-20168) Enable kinesis to start stream from Initial position specified by a timestamp

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20168: Assignee: Apache Spark > Enable kinesis to start stream from Initial position specified

[jira] [Assigned] (SPARK-20168) Enable kinesis to start stream from Initial position specified by a timestamp

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20168: Assignee: (was: Apache Spark) > Enable kinesis to start stream from Initial position

[jira] [Commented] (SPARK-20168) Enable kinesis to start stream from Initial position specified by a timestamp

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015706#comment-16015706 ] Apache Spark commented on SPARK-20168: -- User 'yssharma' has created a pull request for this issue:

[jira] [Commented] (SPARK-14864) [MLLIB] Implement Doc2Vec

2017-05-18 Thread Rajdeep Mondal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015693#comment-16015693 ] Rajdeep Mondal commented on SPARK-14864: Sorry to bother. Any progress on this? > [MLLIB]

[jira] [Commented] (SPARK-20797) mllib lda load and save out of memory.

2017-05-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015657#comment-16015657 ] Sean Owen commented on SPARK-20797: --- It's not clear what you're describing here. Can you reduce this to

[jira] [Updated] (SPARK-20798) GenerateUnsafeProjection should check if value is null before calling the getter

2017-05-18 Thread Ala Luszczak (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ala Luszczak updated SPARK-20798: - Description: GenerateUnsafeProjection.writeStructToBuffer() does not honor the assumption that

[jira] [Updated] (SPARK-20796) the location of start-master.sh in spark-standalone.md is wrong

2017-05-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-20796: -- Priority: Trivial (was: Major) [~liuzhaokun] please don't open a JIRA for these. They're trivial.

[jira] [Created] (SPARK-20798) GenerateUnsafeProjection should check if value is null before calling the getter

2017-05-18 Thread Ala Luszczak (JIRA)
Ala Luszczak created SPARK-20798: Summary: GenerateUnsafeProjection should check if value is null before calling the getter Key: SPARK-20798 URL: https://issues.apache.org/jira/browse/SPARK-20798

[jira] [Created] (SPARK-20797) mllib lda load and save out of memory.

2017-05-18 Thread d0evi1 (JIRA)
d0evi1 created SPARK-20797: -- Summary: mllib lda load and save out of memory. Key: SPARK-20797 URL: https://issues.apache.org/jira/browse/SPARK-20797 Project: Spark Issue Type: Bug

[jira] [Assigned] (SPARK-20796) the location of start-master.sh in spark-standalone.md is wrong

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20796: Assignee: Apache Spark > the location of start-master.sh in spark-standalone.md is wrong

[jira] [Assigned] (SPARK-20796) the location of start-master.sh in spark-standalone.md is wrong

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20796: Assignee: (was: Apache Spark) > the location of start-master.sh in

[jira] [Commented] (SPARK-20796) the location of start-master.sh in spark-standalone.md is wrong

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015631#comment-16015631 ] Apache Spark commented on SPARK-20796: -- User 'liu-zhaokun' has created a pull request for this

[jira] [Created] (SPARK-20796) the location of start-master.sh in spark-standalone.md is wrong

2017-05-18 Thread liuzhaokun (JIRA)
liuzhaokun created SPARK-20796: -- Summary: the location of start-master.sh in spark-standalone.md is wrong Key: SPARK-20796 URL: https://issues.apache.org/jira/browse/SPARK-20796 Project: Spark

[jira] [Commented] (SPARK-19275) Spark Streaming, Kafka receiver, "Failed to get records for ... after polling for 512"

2017-05-18 Thread Aaquib Khwaja (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015591#comment-16015591 ] Aaquib Khwaja commented on SPARK-19275: --- Hi [~dmitry_iii], I also ran into a similar issue. I've

[jira] [Resolved] (SPARK-20795) Maximum document frequency for IDF

2017-05-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20795. --- Resolution: Invalid Please start this as a question on the mailing list. > Maximum document

[jira] [Commented] (SPARK-20795) Maximum document frequency for IDF

2017-05-18 Thread Turan Gojayev (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015580#comment-16015580 ] Turan Gojayev commented on SPARK-20795: --- I am a total newbie here, so excuse me if I've set

[jira] [Updated] (SPARK-20795) Maximum document frequency for IDF

2017-05-18 Thread Turan Gojayev (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Turan Gojayev updated SPARK-20795: -- Description: In current implementation of IDF there is no way for setting maximum number of

[jira] [Created] (SPARK-20795) Maximum document frequency for IDF

2017-05-18 Thread Turan Gojayev (JIRA)
Turan Gojayev created SPARK-20795: - Summary: Maximum document frequency for IDF Key: SPARK-20795 URL: https://issues.apache.org/jira/browse/SPARK-20795 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-20784) Spark hangs (v2.0) or Futures timed out (v2.1) after a joinWith() and cache() in YARN client mode

2017-05-18 Thread Mathieu D (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-20784: -- Affects Version/s: 2.1.1 Description: Spark hangs and stop executing any job or task

[jira] [Commented] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015455#comment-16015455 ] Nick Pentreath commented on SPARK-20768: Sure - though perhaps [~yuhaoyan] can give an opinion

[jira] [Comment Edited] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread 颜发才
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015448#comment-16015448 ] Yan Facai (颜发才) edited comment on SPARK-20768 at 5/18/17 8:59 AM: -- It

[jira] [Commented] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread 颜发才
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015448#comment-16015448 ] Yan Facai (颜发才) commented on SPARK-20768: - It seems easy, I can work on it. > PySpark FPGrowth

[jira] [Commented] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015440#comment-16015440 ] Nick Pentreath commented on SPARK-20768: It is there - but not documented as a {{Param}} and so

[jira] [Commented] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread 颜发才
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015437#comment-16015437 ] Yan Facai (颜发才) commented on SPARK-20768: - Hi, I'm newbie. `numPartitions` is found in pyspark

[jira] [Commented] (SPARK-16202) Misleading Description of CreatableRelationProvider's createRelation

2017-05-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015386#comment-16015386 ] Apache Spark commented on SPARK-16202: -- User 'jaceklaskowski' has created a pull request for this

[jira] [Commented] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015382#comment-16015382 ] Nick Pentreath commented on SPARK-20506: Oh also SPARK-14503 is important > ML, Graph 2.2 QA:

[jira] [Commented] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015362#comment-16015362 ] Nick Pentreath commented on SPARK-20506: Cool - I've added a section before the Migration Guide

[jira] [Commented] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

2017-05-18 Thread 颜发才
[ https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015345#comment-16015345 ] Yan Facai (颜发才) commented on SPARK-19581: - [~barrybecker4] Hi, Becker. I can't reproduce the bug

[jira] [Resolved] (SPARK-20794) Spark show() command on dataset does not retrieve consistent rows from DASHDB data source

2017-05-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20794. --- Resolution: Invalid It's a question, so belongs on the mailing list. I think it's a DASHDB

[jira] [Created] (SPARK-20794) Spark show() command on dataset does not retrieve consistent rows from DASHDB data source

2017-05-18 Thread Sahana HA (JIRA)
Sahana HA created SPARK-20794: - Summary: Spark show() command on dataset does not retrieve consistent rows from DASHDB data source Key: SPARK-20794 URL: https://issues.apache.org/jira/browse/SPARK-20794

[jira] [Closed] (SPARK-20793) cache table will not refresh after insert data to some broadcast table

2017-05-18 Thread du (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] du closed SPARK-20793. -- Resolution: Not A Problem > cache table will not refresh after insert data to some broadcast table >

[jira] [Updated] (SPARK-20793) cache table will not refresh after insert data to some broadcast table

2017-05-18 Thread du (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] du updated SPARK-20793: --- Description: run below sql in spark-sql or beeline create table t4(c1 int,c2 int); insert into table t4 select 1,2;

[jira] [Created] (SPARK-20793) cache table will not refresh after insert data to some broadcast table

2017-05-18 Thread du (JIRA)
du created SPARK-20793: -- Summary: cache table will not refresh after insert data to some broadcast table Key: SPARK-20793 URL: https://issues.apache.org/jira/browse/SPARK-20793 Project: Spark Issue

[jira] [Updated] (SPARK-20700) InferFiltersFromConstraints stackoverflows for query (v2)

2017-05-18 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20700: Fix Version/s: 2.2.0 > InferFiltersFromConstraints stackoverflows for query (v2) >

[jira] [Resolved] (SPARK-20700) InferFiltersFromConstraints stackoverflows for query (v2)

2017-05-18 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20700. - Resolution: Fixed > InferFiltersFromConstraints stackoverflows for query (v2) >