[jira] [Created] (SPARK-11000) Derby have booted the database twice in yarn security mode.

2015-10-08 Thread SaintBacchus (JIRA)
SaintBacchus created SPARK-11000: Summary: Derby have booted the database twice in yarn security mode. Key: SPARK-11000 URL: https://issues.apache.org/jira/browse/SPARK-11000 Project: Spark

[jira] [Commented] (SPARK-11000) Derby have booted the database twice in yarn security mode.

2015-10-08 Thread SaintBacchus (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948138#comment-14948138 ] SaintBacchus commented on SPARK-11000: -- Very similar with it but this is in yarn security mode. >

[jira] [Commented] (SPARK-11000) Derby have booted the database twice in yarn security mode.

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948237#comment-14948237 ] Apache Spark commented on SPARK-11000: -- User 'SaintBacchus' has created a pull request for this

[jira] [Assigned] (SPARK-11000) Derby have booted the database twice in yarn security mode.

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11000: Assignee: Apache Spark > Derby have booted the database twice in yarn security mode. >

[jira] [Assigned] (SPARK-11000) Derby have booted the database twice in yarn security mode.

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11000: Assignee: (was: Apache Spark) > Derby have booted the database twice in yarn security

[jira] [Comment Edited] (SPARK-10981) R semijoin leads to Java errors, R leftsemi leads to Spark errors

2015-10-08 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948299#comment-14948299 ] Sun Rui edited comment on SPARK-10981 at 10/8/15 8:48 AM: -- yes, this is a bug in

[jira] [Commented] (SPARK-10981) R semijoin leads to Java errors, R leftsemi leads to Spark errors

2015-10-08 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948299#comment-14948299 ] Sun Rui commented on SPARK-10981: - yes, this is a bug in SparkR. your fix looks good. Could you submit a

[jira] [Closed] (SPARK-10879) spark on yarn support priority option

2015-10-08 Thread Yun Zhao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Zhao closed SPARK-10879. Resolution: Later > spark on yarn support priority option > - > >

[jira] [Commented] (SPARK-10971) sparkR: RRunner should allow setting path to Rscript

2015-10-08 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948307#comment-14948307 ] Sun Rui commented on SPARK-10971: - just be curious: how do you distribute RScript to YARN nodes? Why not

[jira] [Updated] (SPARK-10960) SQL with windowing function cannot reference column in inner select block

2015-10-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10960: Description: There seems to be a bug in the Spark SQL parser when I use windowing functions.

[jira] [Commented] (SPARK-10903) Make sqlContext global

2015-10-08 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948268#comment-14948268 ] Sun Rui commented on SPARK-10903: - There are a number of functions defined in SQLContext.R taking a

[jira] [Resolved] (SPARK-7869) Spark Data Frame Fails to Load Postgres Tables with JSONB DataType Columns

2015-10-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-7869. Resolution: Fixed Fix Version/s: 1.6.0 > Spark Data Frame Fails to Load Postgres Tables with

[jira] [Commented] (SPARK-10977) SQL injection bugs in JdbcUtils and DataFrameWriter

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948302#comment-14948302 ] Sean Owen commented on SPARK-10977: --- It's a JDBC thing rather than database specific (e.g. parsed by

[jira] [Commented] (SPARK-10999) Physical plan node Coalesce should be able to handle UnsafeRow

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948133#comment-14948133 ] Apache Spark commented on SPARK-10999: -- User 'liancheng' has created a pull request for this issue:

[jira] [Assigned] (SPARK-10999) Physical plan node Coalesce should be able to handle UnsafeRow

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10999: Assignee: Cheng Lian (was: Apache Spark) > Physical plan node Coalesce should be able to

[jira] [Assigned] (SPARK-10999) Physical plan node Coalesce should be able to handle UnsafeRow

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10999: Assignee: Apache Spark (was: Cheng Lian) > Physical plan node Coalesce should be able to

[jira] [Updated] (SPARK-10974) Add progress bar for output operation column and use red dots for failed batches

2015-10-08 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-10974: - Summary: Add progress bar for output operation column and use red dots for failed batches (was:

[jira] [Created] (SPARK-10999) Physical plan node Coalesce should be able to handle UnsafeRow

2015-10-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10999: -- Summary: Physical plan node Coalesce should be able to handle UnsafeRow Key: SPARK-10999 URL: https://issues.apache.org/jira/browse/SPARK-10999 Project: Spark

[jira] [Commented] (SPARK-10326) Cannot launch YARN job on Windows

2015-10-08 Thread Jose Antonio (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948206#comment-14948206 ] Jose Antonio commented on SPARK-10326: -- C:\WINDOWS\system32>pyspark --master yarn-client Python

[jira] [Commented] (SPARK-10919) Association rules class should return the support of each rule

2015-10-08 Thread Tofigh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948235#comment-14948235 ] Tofigh commented on SPARK-10919: sure > Association rules class should return the support of each rule >

[jira] [Issue Comment Deleted] (SPARK-11000) Derby have booted the database twice in yarn security mode.

2015-10-08 Thread SaintBacchus (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-11000: - Comment: was deleted (was: Very similar with it but this is in yarn security mode.) > Derby

[jira] [Resolved] (SPARK-9040) StructField datatype Conversion Error

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-9040. -- Resolution: Not A Problem Fix Version/s: (was: 1.4.0) > StructField datatype Conversion

[jira] [Updated] (SPARK-10752) Implement corr() and cov in DataFrameStatFunctions

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10752: -- Assignee: Sun Rui > Implement corr() and cov in DataFrameStatFunctions >

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948511#comment-14948511 ] Sean Owen commented on SPARK-10914: --- I don't think having it on one machine necessarily matters. You

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

2015-10-08 Thread Ben Moran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948492#comment-14948492 ] Ben Moran commented on SPARK-10914: --- I just tried moving the master to the worker box, so it's entirely

[jira] [Reopened] (SPARK-9040) StructField datatype Conversion Error

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-9040: -- > StructField datatype Conversion Error > - > > Key:

[jira] [Resolved] (SPARK-10939) Misaligned data with RDD.zip after repartition

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-10939. --- Resolution: Not A Problem Provisionally resolving as not a problem since RDDs don't have a

[jira] [Updated] (SPARK-10979) SparkR: Add merge to DataFrame

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10979: -- Component/s: SparkR > SparkR: Add merge to DataFrame > -- > >

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948486#comment-14948486 ] Sean Owen commented on SPARK-10914: --- Still kind of guessing here... but what if the problem is that the

[jira] [Reopened] (SPARK-10940) Too many open files Spark Shuffle

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-10940: --- > Too many open files Spark Shuffle > - > > Key:

[jira] [Commented] (SPARK-10939) Misaligned data with RDD.zip after repartition

2015-10-08 Thread Michael Malak (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948596#comment-14948596 ] Michael Malak commented on SPARK-10939: --- Here Matei explains the explicit design decision to prefer

[jira] [Created] (SPARK-11001) SQLContext doesn't support window function

2015-10-08 Thread jixing.ji (JIRA)
jixing.ji created SPARK-11001: - Summary: SQLContext doesn't support window function Key: SPARK-11001 URL: https://issues.apache.org/jira/browse/SPARK-11001 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

2015-10-08 Thread Ben Moran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948460#comment-14948460 ] Ben Moran commented on SPARK-10914: --- On latest master for me .count() also always seems to return 5 for

[jira] [Resolved] (SPARK-10883) Document building each module individually

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-10883. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8993

[jira] [Resolved] (SPARK-10940) Too many open files Spark Shuffle

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-10940. --- Resolution: Cannot Reproduce > Too many open files Spark Shuffle > -

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

2015-10-08 Thread Ben Moran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948513#comment-14948513 ] Ben Moran commented on SPARK-10914: --- I think you've got it - if I also turn off UseCompressedOops for

[jira] [Updated] (SPARK-10978) Allow PrunedFilterScan to eliminate predicates from further evaluation

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10978: -- Priority: Minor (was: Major) ([~rspitzer] don't set Fix Version) > Allow PrunedFilterScan to

[jira] [Created] (SPARK-11002) pyspark doesn't support UDAF

2015-10-08 Thread jixing.ji (JIRA)
jixing.ji created SPARK-11002: - Summary: pyspark doesn't support UDAF Key: SPARK-11002 URL: https://issues.apache.org/jira/browse/SPARK-11002 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948438#comment-14948438 ] Sean Owen commented on SPARK-10914: --- I ran the latest master in standalone mode, with

[jira] [Updated] (SPARK-10883) Document building each module individually

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10883: -- Assignee: Jean-Baptiste Onofré > Document building each module individually >

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

2015-10-08 Thread Ben Moran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948494#comment-14948494 ] Ben Moran commented on SPARK-10914: --- Either using the large heap, or -XX:-UseCompressedOops triggers

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

2015-10-08 Thread Ben Moran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948464#comment-14948464 ] Ben Moran commented on SPARK-10914: --- I also don't see it if I run spark-shell without setting --master.

[jira] [Updated] (SPARK-10925) Exception when joining DataFrames

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10925: -- Component/s: SQL > Exception when joining DataFrames > - > >

[jira] [Updated] (SPARK-10939) Misaligned data with RDD.zip after repartition

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10939: -- Component/s: Spark Core > Misaligned data with RDD.zip after repartition >

[jira] [Commented] (SPARK-10995) Graceful shutdown drops processing in Spark Streaming

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948693#comment-14948693 ] Sean Owen commented on SPARK-10995: --- TD's the expert, but I don't really get that -- if you're

[jira] [Commented] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2015-10-08 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948678#comment-14948678 ] Imran Rashid commented on SPARK-4105: - [~nadenf] The FetchFailures don't need to be on the same node

[jira] [Created] (SPARK-11003) Allowing UserDefinedTypes to extend primatives

2015-10-08 Thread John Muller (JIRA)
John Muller created SPARK-11003: --- Summary: Allowing UserDefinedTypes to extend primatives Key: SPARK-11003 URL: https://issues.apache.org/jira/browse/SPARK-11003 Project: Spark Issue Type:

[jira] [Commented] (SPARK-7874) Add a global setting for the fine-grained mesos scheduler that limits the number of concurrent tasks of a job

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948638#comment-14948638 ] Apache Spark commented on SPARK-7874: - User 'dragos' has created a pull request for this issue:

[jira] [Commented] (SPARK-10995) Graceful shutdown drops processing in Spark Streaming

2015-10-08 Thread Michal Cizmazia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948667#comment-14948667 ] Michal Cizmazia commented on SPARK-10995: - [On 7 October 2015 at 21:24, Tathagata

[jira] [Updated] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Strycker updated SPARK-11004: --- Description: Could a feature be added to Spark that would use disk-only MapReduce operations

[jira] [Created] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-11004: -- Summary: MapReduce Hive-like join operations for RDDs Key: SPARK-11004 URL: https://issues.apache.org/jira/browse/SPARK-11004 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-10999) Physical plan node Coalesce should be able to handle UnsafeRow

2015-10-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10999. Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9024

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948943#comment-14948943 ] Glenn Strycker commented on SPARK-11004: True, fixing the 2GB will go a long way. However, this

[jira] [Updated] (SPARK-11005) Spark 1.5 Shuffle performance - (sort-based shuffle)

2015-10-08 Thread Sandeep Pal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Pal updated SPARK-11005: Summary: Spark 1.5 Shuffle performance - (sort-based shuffle) (was: Spark 1.5 Shuffle

[jira] [Resolved] (SPARK-10987) yarn-client mode misbehaving with netty-based RPC backend

2015-10-08 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-10987. Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 1.6.0 >

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948920#comment-14948920 ] Sean Owen commented on SPARK-11004: --- Spark has had a sort-based shuffle for a while, which is a lot of

[jira] [Commented] (SPARK-10942) Not all cached RDDs are unpersisted

2015-10-08 Thread Nick Pritchard (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948928#comment-14948928 ] Nick Pritchard commented on SPARK-10942: Thanks [~sowen] for trying! I'll let it go. > Not all

[jira] [Assigned] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-10-08 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-10858: - Assignee: Thomas Graves > YARN: archives/jar/files rename with # doesn't work unless

[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-10-08 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948941#comment-14948941 ] Thomas Graves commented on SPARK-10858: --- Sorry for the delay on this didn't have time to look at

[jira] [Created] (SPARK-11005) Spark 1.5 Shuffle performance

2015-10-08 Thread Sandeep Pal (JIRA)
Sandeep Pal created SPARK-11005: --- Summary: Spark 1.5 Shuffle performance Key: SPARK-11005 URL: https://issues.apache.org/jira/browse/SPARK-11005 Project: Spark Issue Type: Question

[jira] [Updated] (SPARK-11005) Spark 1.5 Shuffle performance - (sort-based shuffle)

2015-10-08 Thread Sandeep Pal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Pal updated SPARK-11005: Environment: 6 node cluster with 1 master and 5 worker nodes. Memory > 100 GB each Cores = 72 each

[jira] [Resolved] (SPARK-10836) SparkR: Add sort function to dataframe

2015-10-08 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-10836. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request

[jira] [Updated] (SPARK-10836) SparkR: Add sort function to dataframe

2015-10-08 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-10836: -- Assignee: Narine Kokhlikyan > SparkR: Add sort function to dataframe >

[jira] [Resolved] (SPARK-10998) Show non-children in default Expression.toString

2015-10-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10998. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9022

[jira] [Resolved] (SPARK-8654) Analysis exception when using "NULL IN (...)": invalid cast

2015-10-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-8654. - Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8983

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949060#comment-14949060 ] Sean Owen commented on SPARK-11004: --- Literally run a Mapper and Reducer on Spark? I think it would be

[jira] [Commented] (SPARK-11005) Spark 1.5 Shuffle performance - (sort-based shuffle)

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949059#comment-14949059 ] Sean Owen commented on SPARK-11005: --- [~vnayak053] coud I ask you to put this on the mailing list? It's

[jira] [Updated] (SPARK-10914) UnsafeRow serialization breaks when two machines have different Oops size

2015-10-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10914: Description: Updated description (by rxin on Oct 8, 2015) To reproduce, launch Spark using {code}

[jira] [Created] (SPARK-11007) Add dictionary support for CatalystDecimalConverter

2015-10-08 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-11007: -- Summary: Add dictionary support for CatalystDecimalConverter Key: SPARK-11007 URL: https://issues.apache.org/jira/browse/SPARK-11007 Project: Spark Issue Type:

[jira] [Updated] (SPARK-10914) UnsafeRow serialization breaks when two machines have different Oops size

2015-10-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10914: Summary: UnsafeRow serialization breaks when two machines have different Oops size (was:

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949076#comment-14949076 ] Glenn Strycker commented on SPARK-11004: Currently we could do the following from withing a linux

[jira] [Commented] (SPARK-11006) Rename NullColumnAccess as NullColumnAccessor

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949010#comment-14949010 ] Apache Spark commented on SPARK-11006: -- User 'tedyu' has created a pull request for this issue:

[jira] [Assigned] (SPARK-11006) Rename NullColumnAccess as NullColumnAccessor

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11006: Assignee: (was: Apache Spark) > Rename NullColumnAccess as NullColumnAccessor >

[jira] [Assigned] (SPARK-11006) Rename NullColumnAccess as NullColumnAccessor

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11006: Assignee: Apache Spark > Rename NullColumnAccess as NullColumnAccessor >

[jira] [Updated] (SPARK-10914) UnsafeRow serialization breaks when two machines have different Oops size

2015-10-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10914: Description: *Updated description (by rxin on Oct 8, 2015)* To reproduce, launch Spark using

[jira] [Updated] (SPARK-10914) UnsafeRow serialization breaks when two machines have different Oops size

2015-10-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10914: Description: Updated description (by rxin on Oct 8, 2015) To reproduce, launch Spark using

[jira] [Commented] (SPARK-10914) UnsafeRow serialization breaks when two machines have different Oops size

2015-10-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949079#comment-14949079 ] Reynold Xin commented on SPARK-10914: - OK I figured it out. Updated the description. > UnsafeRow

[jira] [Updated] (SPARK-10914) UnsafeRow serialization breaks when two machines have different Oops size

2015-10-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10914: Description: *Updated description (by rxin on Oct 8, 2015)* To reproduce, launch Spark using

[jira] [Updated] (SPARK-11008) Spark window function returns inconsistent/wrong results

2015-10-08 Thread Prasad Chalasani (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Chalasani updated SPARK-11008: - Description: Summary: applying a windowing function on a data-frame, followed by count()

[jira] [Updated] (SPARK-11009) RowNumber in HiveContext returns negative values in cluster mode

2015-10-08 Thread Saif Addin Ellafi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saif Addin Ellafi updated SPARK-11009: -- Environment: Standalone cluster mode. No hadoop/hive is present in the environment (no

[jira] [Commented] (SPARK-10971) sparkR: RRunner should allow setting path to Rscript

2015-10-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949167#comment-14949167 ] Felix Cheung commented on SPARK-10971: -- I think he is suggesting the path to R/Rscript to be

[jira] [Commented] (SPARK-10995) Graceful shutdown drops processing in Spark Streaming

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949169#comment-14949169 ] Sean Owen commented on SPARK-10995: --- Ah right what I mean is that your _slide duration_ is equal to

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949171#comment-14949171 ] Glenn Strycker commented on SPARK-11004: So maybe we can simplify this idea down to forcing

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949176#comment-14949176 ] Sean Owen commented on SPARK-11004: --- I suppose I'd be surprised if using disk over memory helped, but

[jira] [Commented] (SPARK-10903) Make sqlContext global

2015-10-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949185#comment-14949185 ] Felix Cheung commented on SPARK-10903: -- [~sunrui] Agreed. I'd like to propose adding .Deprecated to

[jira] [Commented] (SPARK-11004) MapReduce Hive-like join operations for RDDs

2015-10-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949187#comment-14949187 ] Glenn Strycker commented on SPARK-11004: Awesome -- thanks, I'll try that out. Is there a way to

[jira] [Commented] (SPARK-10971) sparkR: RRunner should allow setting path to Rscript

2015-10-08 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949199#comment-14949199 ] Thomas Graves commented on SPARK-10971: --- you shouldn't have to install everything a user needs on

[jira] [Commented] (SPARK-10382) Make example code in user guide testable

2015-10-08 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949326#comment-14949326 ] Xusen Yin commented on SPARK-10382: --- Or I can custom it to use in Spark project easily. > Make example

[jira] [Commented] (SPARK-8546) PMML export for Naive Bayes

2015-10-08 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949361#comment-14949361 ] Xusen Yin commented on SPARK-8546: -- Hi [~mengxr], I'd like to work on it. > PMML export for Naive Bayes

[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949372#comment-14949372 ] Apache Spark commented on SPARK-10858: -- User 'tgravescs' has created a pull request for this issue:

[jira] [Created] (SPARK-11015) Add computeCost and clusterCenters to KMeansModel in spark.ml package

2015-10-08 Thread Richard Garris (JIRA)
Richard Garris created SPARK-11015: -- Summary: Add computeCost and clusterCenters to KMeansModel in spark.ml package Key: SPARK-11015 URL: https://issues.apache.org/jira/browse/SPARK-11015 Project:

[jira] [Commented] (SPARK-5949) Driver program has to register roaring bitmap classes used by spark with Kryo when number of partitions is greater than 2000

2015-10-08 Thread Charles Allen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949525#comment-14949525 ] Charles Allen commented on SPARK-5949: -- [~lemire] pinging to see if you have any suggestions on how

[jira] [Commented] (SPARK-10382) Make example code in user guide testable

2015-10-08 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949324#comment-14949324 ] Xusen Yin commented on SPARK-10382: --- Hi Xiangrui, As far as I know, there are several plugins of

[jira] [Created] (SPARK-11014) RPC Time Out Exceptions

2015-10-08 Thread Gurpreet Singh (JIRA)
Gurpreet Singh created SPARK-11014: -- Summary: RPC Time Out Exceptions Key: SPARK-11014 URL: https://issues.apache.org/jira/browse/SPARK-11014 Project: Spark Issue Type: Bug Affects

[jira] [Commented] (SPARK-6723) Model import/export for ChiSqSelector

2015-10-08 Thread Jayant Shekhar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949548#comment-14949548 ] Jayant Shekhar commented on SPARK-6723: --- Thanks [~fliang] [~mengxr] Can you trigger tests on the

[jira] [Commented] (SPARK-10936) UDAF "mode" for categorical variables

2015-10-08 Thread Frederick Reiss (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949306#comment-14949306 ] Frederick Reiss commented on SPARK-10936: - Mode is not an algebraic aggregate. To find the mode

[jira] [Created] (SPARK-11013) SparkPlan may mistakenly register child plan's accumulators for SQL metrics

2015-10-08 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-11013: --- Summary: SparkPlan may mistakenly register child plan's accumulators for SQL metrics Key: SPARK-11013 URL: https://issues.apache.org/jira/browse/SPARK-11013 Project:

[jira] [Commented] (SPARK-8654) Analysis exception when using "NULL IN (...)": invalid cast

2015-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949360#comment-14949360 ] Apache Spark commented on SPARK-8654: - User 'marmbrus' has created a pull request for this issue:

[jira] [Resolved] (SPARK-10988) Reduce duplication in Aggregate2's expression rewriting logic

2015-10-08 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-10988. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9015

[jira] [Commented] (SPARK-5949) Driver program has to register roaring bitmap classes used by spark with Kryo when number of partitions is greater than 2000

2015-10-08 Thread Charles Allen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949518#comment-14949518 ] Charles Allen commented on SPARK-5949: -- This breaks when using more recent versions of Roaring where

  1   2   3   >