[jira] [Commented] (SPARK-17479) Fix LDA example in docs

2016-09-09 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479240#comment-15479240 ] zhengruifeng commented on SPARK-17479: -- +1 I test this example in Scala,Java,Py2 and Py3. And all

[jira] [Assigned] (SPARK-17449) Relation between heartbeatInterval and network timeout

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17449: Assignee: Apache Spark > Relation between heartbeatInterval and network timeout >

[jira] [Assigned] (SPARK-17449) Relation between heartbeatInterval and network timeout

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17449: Assignee: (was: Apache Spark) > Relation between heartbeatInterval and network

[jira] [Commented] (SPARK-17449) Relation between heartbeatInterval and network timeout

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479237#comment-15479237 ] Apache Spark commented on SPARK-17449: -- User 'jagadeesanas2' has created a pull request for this

[jira] [Created] (SPARK-17489) Improve filtering for bucketed tables

2016-09-09 Thread Shuai Lin (JIRA)
Shuai Lin created SPARK-17489: - Summary: Improve filtering for bucketed tables Key: SPARK-17489 URL: https://issues.apache.org/jira/browse/SPARK-17489 Project: Spark Issue Type: Improvement

[jira] [Assigned] (SPARK-17488) TakeAndOrder will OOM when the data is very large

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17488: Assignee: (was: Apache Spark) > TakeAndOrder will OOM when the data is very large >

[jira] [Commented] (SPARK-17488) TakeAndOrder will OOM when the data is very large

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479183#comment-15479183 ] Apache Spark commented on SPARK-17488: -- User 'cenyuhai' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17488) TakeAndOrder will OOM when the data is very large

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17488: Assignee: Apache Spark > TakeAndOrder will OOM when the data is very large >

[jira] [Created] (SPARK-17488) TakeAndOrder will OOM when the data is very large

2016-09-09 Thread cen yuhai (JIRA)
cen yuhai created SPARK-17488: - Summary: TakeAndOrder will OOM when the data is very large Key: SPARK-17488 URL: https://issues.apache.org/jira/browse/SPARK-17488 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-17445) Reference an ASF page as the main place to find third-party packages

2016-09-09 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479121#comment-15479121 ] Matei Zaharia commented on SPARK-17445: --- The powered by wiki page is a bit of a mess IMO, so I'd

[jira] [Commented] (SPARK-17450) spark sql rownumber OOM

2016-09-09 Thread cen yuhai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479097#comment-15479097 ] cen yuhai commented on SPARK-17450: --- can you provide me davies's pr? > spark sql rownumber OOM >

[jira] [Comment Edited] (SPARK-17487) Configurable bucketing info extraction

2016-09-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479042#comment-15479042 ] Tejas Patil edited comment on SPARK-17487 at 9/10/16 3:39 AM: -- I have a WIP

[jira] [Assigned] (SPARK-17487) Configurable bucketing info extraction

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17487: Assignee: (was: Apache Spark) > Configurable bucketing info extraction >

[jira] [Commented] (SPARK-17487) Configurable bucketing info extraction

2016-09-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479042#comment-15479042 ] Tejas Patil commented on SPARK-17487: - I have a WIP for this. I am looking for early feedback wrt

[jira] [Assigned] (SPARK-17487) Configurable bucketing info extraction

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17487: Assignee: Apache Spark > Configurable bucketing info extraction >

[jira] [Commented] (SPARK-17487) Configurable bucketing info extraction

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479043#comment-15479043 ] Apache Spark commented on SPARK-17487: -- User 'tejasapatil' has created a pull request for this

[jira] [Created] (SPARK-17487) Configurable bucketing info extraction

2016-09-09 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-17487: --- Summary: Configurable bucketing info extraction Key: SPARK-17487 URL: https://issues.apache.org/jira/browse/SPARK-17487 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-17447) performance improvement in Partitioner.DefaultPartitioner

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478926#comment-15478926 ] Apache Spark commented on SPARK-17447: -- User 'codlife' has created a pull request for this issue:

[jira] [Updated] (SPARK-15453) Improve join planning for bucketed / sorted tables

2016-09-09 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15453: Assignee: Tejas Patil > Improve join planning for bucketed / sorted tables >

[jira] [Resolved] (SPARK-15453) Improve join planning for bucketed / sorted tables

2016-09-09 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-15453. - Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14864

[jira] [Commented] (SPARK-17468) Cluster workers crushed when master network bad more than one WORKER_TIMEOUT_MS!

2016-09-09 Thread zhangzhiyan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478846#comment-15478846 ] zhangzhiyan commented on SPARK-17468: - some of my worker died because of memory exceed hardware

[jira] [Commented] (SPARK-17400) MinMaxScaler.transform() outputs DenseVector by default, which causes poor performance

2016-09-09 Thread Frank Dai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478819#comment-15478819 ] Frank Dai commented on SPARK-17400: --- [~mlnick] After reading the doc of MaxAbsScaler, I think

[jira] [Closed] (SPARK-17400) MinMaxScaler.transform() outputs DenseVector by default, which causes poor performance

2016-09-09 Thread Frank Dai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Dai closed SPARK-17400. - Resolution: Not A Problem > MinMaxScaler.transform() outputs DenseVector by default, which causes poor

[jira] [Assigned] (SPARK-17486) Remove unused TaskMetricsUIData.updatedBlockStatuses field

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17486: Assignee: Josh Rosen (was: Apache Spark) > Remove unused

[jira] [Commented] (SPARK-17486) Remove unused TaskMetricsUIData.updatedBlockStatuses field

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478817#comment-15478817 ] Apache Spark commented on SPARK-17486: -- User 'JoshRosen' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17486) Remove unused TaskMetricsUIData.updatedBlockStatuses field

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17486: Assignee: Apache Spark (was: Josh Rosen) > Remove unused

[jira] [Created] (SPARK-17486) Remove unused TaskMetricsUIData.updatedBlockStatuses field

2016-09-09 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-17486: -- Summary: Remove unused TaskMetricsUIData.updatedBlockStatuses field Key: SPARK-17486 URL: https://issues.apache.org/jira/browse/SPARK-17486 Project: Spark Issue

[jira] [Commented] (SPARK-17476) Proper handling for unseen labels in logistic regression training.

2016-09-09 Thread Xin Ren (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478517#comment-15478517 ] Xin Ren commented on SPARK-17476: - Hi I can try to work on this one, thanks :) > Proper handling for

[jira] [Comment Edited] (SPARK-16239) SQL issues with cast from date to string around daylight savings time

2016-09-09 Thread Dean Wampler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478438#comment-15478438 ] Dean Wampler edited comment on SPARK-16239 at 9/9/16 10:20 PM: --- I

[jira] [Commented] (SPARK-16239) SQL issues with cast from date to string around daylight savings time

2016-09-09 Thread Dean Wampler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478438#comment-15478438 ] Dean Wampler commented on SPARK-16239: -- I invested this a bit today for a customer. I could not

[jira] [Commented] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478428#comment-15478428 ] Jakob Odersky commented on SPARK-14221: --- I just saw that chill already [has a pending PR to upgrade

[jira] [Commented] (SPARK-16834) TrainValildationSplit and direct evaluation produce different scores

2016-09-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478414#comment-15478414 ] Bryan Cutler commented on SPARK-16834: -- [~mmoroz], your sample doesn't quite do the same thing as

[jira] [Assigned] (SPARK-17485) Failed remote cached block reads can lead to whole job failure

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17485: Assignee: Apache Spark (was: Josh Rosen) > Failed remote cached block reads can lead to

[jira] [Assigned] (SPARK-17485) Failed remote cached block reads can lead to whole job failure

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17485: Assignee: Josh Rosen (was: Apache Spark) > Failed remote cached block reads can lead to

[jira] [Commented] (SPARK-17485) Failed remote cached block reads can lead to whole job failure

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478407#comment-15478407 ] Apache Spark commented on SPARK-17485: -- User 'JoshRosen' has created a pull request for this issue:

[jira] [Resolved] (SPARK-17469) mapWithState causes block lock warning

2016-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17469. --- Resolution: Duplicate > mapWithState causes block lock warning >

[jira] [Commented] (SPARK-17469) mapWithState causes block lock warning

2016-09-09 Thread Miao Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478339#comment-15478339 ] Miao Wang commented on SPARK-17469: --- Can you give command for reproduction? > mapWithState causes

[jira] [Commented] (SPARK-16026) Cost-based Optimizer framework

2016-09-09 Thread Srinath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478341#comment-15478341 ] Srinath commented on SPARK-16026: - Thanks for the response: 1. You’re correct that the search space will

[jira] [Created] (SPARK-17485) Failed remote cached block reads can lead to whole job failure

2016-09-09 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-17485: -- Summary: Failed remote cached block reads can lead to whole job failure Key: SPARK-17485 URL: https://issues.apache.org/jira/browse/SPARK-17485 Project: Spark

[jira] [Updated] (SPARK-17485) Failed remote cached block reads can lead to whole job failure

2016-09-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-17485: --- Priority: Critical (was: Major) > Failed remote cached block reads can lead to whole job failure >

[jira] [Resolved] (SPARK-17354) java.lang.ClassCastException: java.lang.Integer cannot be cast to java.sql.Date

2016-09-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17354. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Created] (SPARK-17484) Race condition when cancelling a job during a cache write can lead to block fetch failures

2016-09-09 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-17484: -- Summary: Race condition when cancelling a job during a cache write can lead to block fetch failures Key: SPARK-17484 URL: https://issues.apache.org/jira/browse/SPARK-17484

[jira] [Updated] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

2016-09-09 Thread Gang Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated SPARK-17477: Shepherd: (was: Gang Wu) > SparkSQL cannot handle schema evolution from Int -> Long when parquet files

[jira] [Assigned] (SPARK-17483) Minor refactoring and cleanup in BlockManager block status reporting and block removal

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17483: Assignee: Apache Spark (was: Josh Rosen) > Minor refactoring and cleanup in BlockManager

[jira] [Commented] (SPARK-17483) Minor refactoring and cleanup in BlockManager block status reporting and block removal

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478208#comment-15478208 ] Apache Spark commented on SPARK-17483: -- User 'JoshRosen' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17483) Minor refactoring and cleanup in BlockManager block status reporting and block removal

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17483: Assignee: Josh Rosen (was: Apache Spark) > Minor refactoring and cleanup in BlockManager

[jira] [Assigned] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17477: Assignee: Apache Spark > SparkSQL cannot handle schema evolution from Int -> Long when

[jira] [Commented] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478196#comment-15478196 ] Apache Spark commented on SPARK-17477: -- User 'wgtmac' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17477: Assignee: (was: Apache Spark) > SparkSQL cannot handle schema evolution from Int ->

[jira] [Created] (SPARK-17483) Minor refactoring and cleanup in BlockManager block status reporting and block removal

2016-09-09 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-17483: -- Summary: Minor refactoring and cleanup in BlockManager block status reporting and block removal Key: SPARK-17483 URL: https://issues.apache.org/jira/browse/SPARK-17483

[jira] [Created] (SPARK-17482) Analyzer should be able run on top of optimized rule

2016-09-09 Thread Davies Liu (JIRA)
Davies Liu created SPARK-17482: -- Summary: Analyzer should be able run on top of optimized rule Key: SPARK-17482 URL: https://issues.apache.org/jira/browse/SPARK-17482 Project: Spark Issue Type:

[jira] [Commented] (SPARK-16240) model loading backward compatibility for ml.clustering.LDA

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478139#comment-15478139 ] Apache Spark commented on SPARK-16240: -- User 'jkbradley' has created a pull request for this issue:

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-09-09 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478135#comment-15478135 ] Yun Ni commented on SPARK-5992: --- Thank you very much for reviewing it, Joseph! I will work on the first

[jira] [Commented] (SPARK-15573) Backwards-compatible persistence for spark.ml

2016-09-09 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478131#comment-15478131 ] Joseph K. Bradley commented on SPARK-15573: --- I'd prefer to put this in unit tests to avoid more

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-09-09 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478125#comment-15478125 ] Joseph K. Bradley commented on SPARK-5992: -- The design doc LGTM! Thanks for updating it. Shall

[jira] [Commented] (SPARK-17478) Create spark.eventLog.dir if it does not exist

2016-09-09 Thread Robert Kruszewski (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478087#comment-15478087 ] Robert Kruszewski commented on SPARK-17478: --- Thanks for the pointers and apologies that I

[jira] [Commented] (SPARK-17478) Create spark.eventLog.dir if it does not exist

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478085#comment-15478085 ] Apache Spark commented on SPARK-17478: -- User 'robert3005' has created a pull request for this issue:

[jira] [Commented] (SPARK-17479) Fix LDA example in docs

2016-09-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478058#comment-15478058 ] Nick Pentreath commented on SPARK-17479: I just ran Scala, Java and Python examples of {{ml}} for

[jira] [Commented] (SPARK-17479) Fix LDA example in docs

2016-09-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478050#comment-15478050 ] Nick Pentreath commented on SPARK-17479: I do see the data file:

[jira] [Commented] (SPARK-17480) CompressibleColumnBuilder inefficiently call gatherCompressibilityStats

2016-09-09 Thread Ergin Seyfe (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478043#comment-15478043 ] Ergin Seyfe commented on SPARK-17480: - Yes [~srowen], that was exactly same as my PR:

[jira] [Assigned] (SPARK-17480) CompressibleColumnBuilder inefficiently call gatherCompressibilityStats

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17480: Assignee: (was: Apache Spark) > CompressibleColumnBuilder inefficiently call

[jira] [Assigned] (SPARK-17480) CompressibleColumnBuilder inefficiently call gatherCompressibilityStats

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17480: Assignee: Apache Spark > CompressibleColumnBuilder inefficiently call

[jira] [Resolved] (SPARK-17478) Create spark.eventLog.dir if it does not exist

2016-09-09 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-17478. Resolution: Duplicate There are reasons why this will never be done. > Create

[jira] [Commented] (SPARK-17480) CompressibleColumnBuilder inefficiently call gatherCompressibilityStats

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478040#comment-15478040 ] Apache Spark commented on SPARK-17480: -- User 'seyfe' has created a pull request for this issue:

[jira] [Commented] (SPARK-17480) CompressibleColumnBuilder inefficiently call gatherCompressibilityStats

2016-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478035#comment-15478035 ] Sean Owen commented on SPARK-17480: --- Yeah, I wonder if the "unrolled" while loop here is really the

[jira] [Updated] (SPARK-17481) Flaky test: org.apache.spark.DistributedSuite.passing environment variables to cluster

2016-09-09 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-17481: - Attachment: log-17481.txt > Flaky test: org.apache.spark.DistributedSuite.passing environment variables

[jira] [Created] (SPARK-17481) Flaky test: org.apache.spark.DistributedSuite.passing environment variables to cluster

2016-09-09 Thread Yin Huai (JIRA)
Yin Huai created SPARK-17481: Summary: Flaky test: org.apache.spark.DistributedSuite.passing environment variables to cluster Key: SPARK-17481 URL: https://issues.apache.org/jira/browse/SPARK-17481

[jira] [Updated] (SPARK-17478) Create spark.eventLog.dir if it does not exist

2016-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17478: -- Issue Type: Improvement (was: Bug) When this has come up previously, I think the problem has been

[jira] [Updated] (SPARK-17480) CompressibleColumnBuilder inefficiently call gatherCompressibilityStats

2016-09-09 Thread Ergin Seyfe (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ergin Seyfe updated SPARK-17480: Description: When we profile one of our Spark jobs we saw that: 6.24% of the CPU is spend on

[jira] [Updated] (SPARK-17480) CompressibleColumnBuilder inefficiently call gatherCompressibilityStats

2016-09-09 Thread Ergin Seyfe (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ergin Seyfe updated SPARK-17480: Description: When we profile one of our Spark jobs we saw that: 6.24% of the CPU is spend on

[jira] [Created] (SPARK-17480) CompressibleColumnBuilder inefficiently call gatherCompressibilityStats

2016-09-09 Thread Ergin Seyfe (JIRA)
Ergin Seyfe created SPARK-17480: --- Summary: CompressibleColumnBuilder inefficiently call gatherCompressibilityStats Key: SPARK-17480 URL: https://issues.apache.org/jira/browse/SPARK-17480 Project:

[jira] [Commented] (SPARK-17479) Fix LDA example in docs

2016-09-09 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478019#comment-15478019 ] Joseph K. Bradley commented on SPARK-17479: --- CC [~podongfeng] [~mlnick] from the previous PR on

[jira] [Created] (SPARK-17479) Fix LDA example in docs

2016-09-09 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-17479: - Summary: Fix LDA example in docs Key: SPARK-17479 URL: https://issues.apache.org/jira/browse/SPARK-17479 Project: Spark Issue Type: Documentation

[jira] [Created] (SPARK-17478) Create spark.eventLog.dir if it does not exist

2016-09-09 Thread Robert Kruszewski (JIRA)
Robert Kruszewski created SPARK-17478: - Summary: Create spark.eventLog.dir if it does not exist Key: SPARK-17478 URL: https://issues.apache.org/jira/browse/SPARK-17478 Project: Spark

[jira] [Updated] (SPARK-17474) Python UDF does not work between Sort and Limit

2016-09-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-17474: --- Summary: Python UDF does not work between Sort and Limit (was: expressions of QueryPlan does not

[jira] [Updated] (SPARK-17474) expressions of QueryPlan does not include those inside Option[Seq[Expression]]

2016-09-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-17474: --- Affects Version/s: (was: 1.6.2) (was: 1.5.2) > expressions of

[jira] [Assigned] (SPARK-17474) expressions of QueryPlan does not include those inside Option[Seq[Expression]]

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17474: Assignee: Davies Liu (was: Apache Spark) > expressions of QueryPlan does not include

[jira] [Commented] (SPARK-17474) expressions of QueryPlan does not include those inside Option[Seq[Expression]]

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477985#comment-15477985 ] Apache Spark commented on SPARK-17474: -- User 'davies' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17474) expressions of QueryPlan does not include those inside Option[Seq[Expression]]

2016-09-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17474: Assignee: Apache Spark (was: Davies Liu) > expressions of QueryPlan does not include

[jira] [Updated] (SPARK-17476) Proper handling for unseen labels in logistic regression training.

2016-09-09 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-17476: Issue Type: Sub-task (was: New Feature) Parent: SPARK-17133 > Proper handling for unseen labels

[jira] [Commented] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

2016-09-09 Thread Gang Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477945#comment-15477945 ] Gang Wu commented on SPARK-17477: - I'm working on a fix for this issue. Will send pull request soon. >

[jira] [Updated] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

2016-09-09 Thread Gang Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated SPARK-17477: Shepherd: Gang Wu > SparkSQL cannot handle schema evolution from Int -> Long when parquet files > have

[jira] [Commented] (SPARK-4563) Allow spark driver to bind to different ip then advertise ip

2016-09-09 Thread Sunil Kotagiri (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477942#comment-15477942 ] Sunil Kotagiri commented on SPARK-4563: --- +1 I also disagree that it is Minor bug. We spark driver,

[jira] [Created] (SPARK-17477) SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type

2016-09-09 Thread Gang Wu (JIRA)
Gang Wu created SPARK-17477: --- Summary: SparkSQL cannot handle schema evolution from Int -> Long when parquet files have Int as its type while hive metastore has Long as its type Key: SPARK-17477 URL:

[jira] [Resolved] (SPARK-17433) YarnShuffleService doesn't handle moving credentials levelDb

2016-09-09 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-17433. --- Resolution: Fixed Assignee: Thomas Graves Fix Version/s: 2.1.0 >

[jira] [Commented] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477922#comment-15477922 ] Josh Rosen commented on SPARK-14221: Assuming that the license is the same, I don't see any reason

[jira] [Comment Edited] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903 ] Jakob Odersky edited comment on SPARK-14221 at 9/9/16 6:30 PM: ---

[jira] [Commented] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Jakob Odersky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903 ] Jakob Odersky commented on SPARK-14221: --- [~joshrosen]'s upstream PR requires Kryo 3.1, a version

[jira] [Updated] (SPARK-17475) HDFSMetadataLog should not leak CRC files

2016-09-09 Thread Frederick Reiss (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frederick Reiss updated SPARK-17475: Description: When HDFSMetadataLog uses a log directory on a filesystem other than HDFS

[jira] [Updated] (SPARK-17421) Document warnings about "MaxPermSize" parameter when building with Maven and Java 8

2016-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17421: -- Priority: Trivial (was: Minor) Issue Type: Improvement (was: Bug) Summary: Document

[jira] [Commented] (SPARK-17466) Error message is not very clear

2016-09-09 Thread Tim Chan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477873#comment-15477873 ] Tim Chan commented on SPARK-17466: -- Thanks [~srowen]! > Error message is not very clear >

[jira] [Created] (SPARK-17476) Proper handling for unseen labels in logistic regression training.

2016-09-09 Thread Seth Hendrickson (JIRA)
Seth Hendrickson created SPARK-17476: Summary: Proper handling for unseen labels in logistic regression training. Key: SPARK-17476 URL: https://issues.apache.org/jira/browse/SPARK-17476 Project:

[jira] [Commented] (SPARK-17421) Warnings about "MaxPermSize" parameter when building with Maven and Java 8

2016-09-09 Thread Frederick Reiss (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477864#comment-15477864 ] Frederick Reiss commented on SPARK-17421: - Committer feedback on first PR was that the necessary

[jira] [Commented] (SPARK-17466) Error message is not very clear

2016-09-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477862#comment-15477862 ] Sean Owen commented on SPARK-17466: --- That would depend on your query. That's also a bit of a different

[jira] [Commented] (SPARK-17471) Add compressed method for Matrix class

2016-09-09 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477848#comment-15477848 ] DB Tsai commented on SPARK-17471: - BTW, we need to determine which sparse matrix format will be used for

[jira] [Commented] (SPARK-17466) Error message is not very clear

2016-09-09 Thread Tim Chan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477849#comment-15477849 ] Tim Chan commented on SPARK-17466: -- I suppose, I don't understand why I'm limited to 1 preceding. >

[jira] [Commented] (SPARK-17471) Add compressed method for Matrix class

2016-09-09 Thread Seth Hendrickson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477841#comment-15477841 ] Seth Hendrickson commented on SPARK-17471: -- [~yanboliang] I guess it can be seen as a duplicate,

[jira] [Updated] (SPARK-17456) Utility for parsing Spark versions

2016-09-09 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-17456: -- Fix Version/s: 2.0.1 > Utility for parsing Spark versions >

[jira] [Updated] (SPARK-16240) model loading backward compatibility for ml.clustering.LDA

2016-09-09 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-16240: -- Target Version/s: 2.0.1, 2.1.0 > model loading backward compatibility for

[jira] [Commented] (SPARK-17456) Utility for parsing Spark versions

2016-09-09 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477701#comment-15477701 ] Joseph K. Bradley commented on SPARK-17456: --- Backporting to 2.0 branch so that we can put the

  1   2   >