[jira] [Commented] (SPARK-13305) With SPARK_WORKER_WEBUI_PORT and --webui-port set for start-slave.sh script, --webui-port is used twice

2016-02-12 Thread Jacek Laskowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145512#comment-15145512
 ] 

Jacek Laskowski commented on SPARK-13305:
-

I was explicit about the different ways of setting the port of the worker's web 
UI, but it could be harder to figure out. {{SPARK_WORKER_WEBUI_PORT=1}} 
could be in {{conf/spark-env.sh}}.

The point is that 
[WorkerArguments|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/WorkerArguments.scala#L48]
 is doing the env var lookup anyway and there is no need to have the env var 
mapped to {{--webui-port}} inside {{sbin/start-slave.sh}}. I think 
{{sbin/start-slave.sh}} should *not* set it as {{--webui-port}}, but just 
{{export}} it (as {{bin/load-spark-env.sh}} does with the other env vars).

It should be an easy fix that would make the Spark command easier on eyes, i.e. 
without {{--webui-port}} used twice (that looks just...messy).

> With SPARK_WORKER_WEBUI_PORT and --webui-port set for start-slave.sh script, 
> --webui-port is used twice
> ---
>
> Key: SPARK-13305
> URL: https://issues.apache.org/jira/browse/SPARK-13305
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> Executing the following command to start a worker:
> {code}
> SPARK_WORKER_WEBUI_PORT=1 ./sbin/start-slave.sh spark://localhost:7077 
> --webui-port 2
> {code}
> ends up with the following Spark command (in the log file) -- some characters 
> cut off to make it relevant:
> {code}
> Spark Command: [cut] org.apache.spark.deploy.worker.Worker --webui-port 1 
> spark://localhost:7077 --webui-port 2
> {code}
> Note {{--webui-port}} set twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6166) Limit number of in flight outbound requests for shuffle fetch

2016-02-12 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-6166:

Assignee: Sanket Reddy

> Limit number of in flight outbound requests for shuffle fetch
> -
>
> Key: SPARK-6166
> URL: https://issues.apache.org/jira/browse/SPARK-6166
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Mridul Muralidharan
>Assignee: Sanket Reddy
>Priority: Minor
> Fix For: 2.0.0
>
>
> spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of 
> size.
> But this is not always sufficient : when the number of hosts in the cluster 
> increase, this can lead to very large number of in-bound connections to one 
> more nodes - causing workers to fail under the load.
> I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on 
> number of outstanding outbound requests.
> This might still cause hotspots in the cluster, but in our tests this has 
> significantly reduced the occurance of worker failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13306) Uncorrelated scalar subquery

2016-02-12 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-13306:
---
Component/s: SQL

> Uncorrelated scalar subquery
> 
>
> Key: SPARK-13306
> URL: https://issues.apache.org/jira/browse/SPARK-13306
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> A scalar subquery is a subquery that only generate single row and single 
> column, could be used as part of expression.
> Uncorrelated scalar subquery means it does not has a reference to external 
> table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7367) spark-submit CLI --help -h overrides the application arguments

2016-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145520#comment-15145520
 ] 

Apache Spark commented on SPARK-7367:
-

User 'BimalTandel' has created a pull request for this issue:
https://github.com/apache/spark/pull/11191

> spark-submit CLI --help -h overrides the application arguments
> --
>
> Key: SPARK-7367
> URL: https://issues.apache.org/jira/browse/SPARK-7367
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Reporter: Gianmario Spacagna
>Priority: Minor
>
> The spark-submit script will parse the --help argument even if is provided as 
> application argument.
> E.g. 
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar --help
> or
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -h
> If my application is using a parsing library, such as Scallop, then it will 
> never be able to run the application with --help as argument.
> I think the spark-submit script should only print the help message when is 
> provided as single argument like this:
> spark-submit --help
> or it should provide a separator for trailing arguments:
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -- --help --arg1 --arg2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7367) spark-submit CLI --help -h overrides the application arguments

2016-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7367:
---

Assignee: Apache Spark

> spark-submit CLI --help -h overrides the application arguments
> --
>
> Key: SPARK-7367
> URL: https://issues.apache.org/jira/browse/SPARK-7367
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Reporter: Gianmario Spacagna
>Assignee: Apache Spark
>Priority: Minor
>
> The spark-submit script will parse the --help argument even if is provided as 
> application argument.
> E.g. 
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar --help
> or
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -h
> If my application is using a parsing library, such as Scallop, then it will 
> never be able to run the application with --help as argument.
> I think the spark-submit script should only print the help message when is 
> provided as single argument like this:
> spark-submit --help
> or it should provide a separator for trailing arguments:
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -- --help --arg1 --arg2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7367) spark-submit CLI --help -h overrides the application arguments

2016-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7367:
---

Assignee: (was: Apache Spark)

> spark-submit CLI --help -h overrides the application arguments
> --
>
> Key: SPARK-7367
> URL: https://issues.apache.org/jira/browse/SPARK-7367
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Reporter: Gianmario Spacagna
>Priority: Minor
>
> The spark-submit script will parse the --help argument even if is provided as 
> application argument.
> E.g. 
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar --help
> or
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -h
> If my application is using a parsing library, such as Scallop, then it will 
> never be able to run the application with --help as argument.
> I think the spark-submit script should only print the help message when is 
> provided as single argument like this:
> spark-submit --help
> or it should provide a separator for trailing arguments:
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -- --help --arg1 --arg2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13305) With SPARK_WORKER_WEBUI_PORT and --webui-port set for start-slave.sh script, --webui-port is used twice

2016-02-12 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145396#comment-15145396
 ] 

Sean Owen commented on SPARK-13305:
---

That looks like what I'd expect it to do. You set the value twice, two 
different ways for some reason, and it's set twice. It's not what you want to 
do, so can this be a problem?

> With SPARK_WORKER_WEBUI_PORT and --webui-port set for start-slave.sh script, 
> --webui-port is used twice
> ---
>
> Key: SPARK-13305
> URL: https://issues.apache.org/jira/browse/SPARK-13305
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> Executing the following command to start a worker:
> {code}
> SPARK_WORKER_WEBUI_PORT=1 ./sbin/start-slave.sh spark://localhost:7077 
> --webui-port 2
> {code}
> ends up with the following Spark command (in the log file) -- some characters 
> cut off to make it relevant:
> {code}
> Spark Command: [cut] org.apache.spark.deploy.worker.Worker --webui-port 1 
> spark://localhost:7077 --webui-port 2
> {code}
> Note {{--webui-port}} set twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13307) TPCDS query 66 degraded by 30% in 1.6.0 compared to 1.4.1

2016-02-12 Thread JESSE CHEN (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JESSE CHEN updated SPARK-13307:
---
Summary: TPCDS query 66 degraded by 30% in 1.6.0 compared to 1.4.1  (was: 
TPCDS query 66 degraded by 35% in 1.6.0 compared to 1.4.1)

> TPCDS query 66 degraded by 30% in 1.6.0 compared to 1.4.1
> -
>
> Key: SPARK-13307
> URL: https://issues.apache.org/jira/browse/SPARK-13307
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: JESSE CHEN
>  Labels: spark,, sql,
>
> Majority of the TPCDS queries ran faster in 1.6.0 than in 1.4.1, average 
> about 9% faster. There are a few degraded, and one that is definitely not 
> within error margin is query 66.
> Query 66 in 1.4.1: 699 seconds
> Query 66 in 1.6.0: 918 seconds
> 30% worse.
> Collected the physical plans from both versions - drastic difference maybe 
> partially from using Tungsten in 1.6, but anything else at play here?
> Please see plans here:
> https://ibm.box.com/spark-sql-q66-debug-160plan
> https://ibm.box.com/spark-sql-q66-debug-141plan



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13307) TPCDS query 66 degraded by 35% in 1.6.0 compared to 1.4.1

2016-02-12 Thread JESSE CHEN (JIRA)
JESSE CHEN created SPARK-13307:
--

 Summary: TPCDS query 66 degraded by 35% in 1.6.0 compared to 1.4.1
 Key: SPARK-13307
 URL: https://issues.apache.org/jira/browse/SPARK-13307
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: JESSE CHEN


Majority of the TPCDS queries ran faster in 1.6.0 than in 1.4.1, average about 
9% faster. There are a few degraded, and one that is definitely not within 
error margin is query 66.

Query 66 in 1.4.1: 699 seconds
Query 66 in 1.6.0: 918 seconds

30% worse.

Collected the physical plans from both versions - drastic difference maybe 
partially from using Tungsten in 1.6, but anything else at play here?

Please see plans here:

https://ibm.box.com/spark-sql-q66-debug-160plan
https://ibm.box.com/spark-sql-q66-debug-141plan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode

2016-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145211#comment-15145211
 ] 

Apache Spark commented on SPARK-5095:
-

User 'mgummelt' has created a pull request for this issue:
https://github.com/apache/spark/pull/11164

> Support launching multiple mesos executors in coarse grained mesos mode
> ---
>
> Key: SPARK-5095
> URL: https://issues.apache.org/jira/browse/SPARK-5095
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.0.0
>Reporter: Timothy Chen
>Assignee: Timothy Chen
> Fix For: 2.0.0
>
>
> Currently in coarse grained mesos mode, it's expected that we only launch one 
> Mesos executor that launches one JVM process to launch multiple spark 
> executors.
> However, this become a problem when the JVM process launched is larger than 
> an ideal size (30gb is recommended value from databricks), which causes GC 
> problems reported on the mailing list.
> We should support launching mulitple executors when large enough resources 
> are available for spark to use, and these resources are still under the 
> configured limit.
> This is also applicable when users want to specifiy number of executors to be 
> launched on each node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9763) Minimize exposure of internal SQL classes

2016-02-12 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145229#comment-15145229
 ] 

Reynold Xin commented on SPARK-9763:


[~flysjy] is that caused by this ticket?


> Minimize exposure of internal SQL classes
> -
>
> Key: SPARK-9763
> URL: https://issues.apache.org/jira/browse/SPARK-9763
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12544) Support window functions in SQLContext

2016-02-12 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145442#comment-15145442
 ] 

Davies Liu commented on SPARK-12544:


[~hvanhovell] Does window functions sill require HiveContext? Or we should 
update the docs/comments for Window functions.

> Support window functions in SQLContext
> --
>
> Key: SPARK-12544
> URL: https://issues.apache.org/jira/browse/SPARK-12544
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Herman van Hovell
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7367) spark-submit CLI --help -h overrides the application arguments

2016-02-12 Thread bimal tandel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145490#comment-15145490
 ] 

bimal tandel commented on SPARK-7367:
-

I had a same problem today and I wrote the patch for it. I am creating a pull 
request.

Based on my analysis there is an unintended consequences of printing help if 
--help is the only argument passed.

example
spark-submit --verbose --help wont print help anymore.

Instead it prints this,
spark-submit --verbose --help
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output


> spark-submit CLI --help -h overrides the application arguments
> --
>
> Key: SPARK-7367
> URL: https://issues.apache.org/jira/browse/SPARK-7367
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Reporter: Gianmario Spacagna
>Priority: Minor
>
> The spark-submit script will parse the --help argument even if is provided as 
> application argument.
> E.g. 
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar --help
> or
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -h
> If my application is using a parsing library, such as Scallop, then it will 
> never be able to run the application with --help as argument.
> I think the spark-submit script should only print the help message when is 
> provided as single argument like this:
> spark-submit --help
> or it should provide a separator for trailing arguments:
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -- --help --arg1 --arg2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12544) Support window functions in SQLContext

2016-02-12 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145581#comment-15145581
 ] 

Davies Liu commented on SPARK-12544:


We are retiring HiveContext in 2.0, we may update the docs together. 

> Support window functions in SQLContext
> --
>
> Key: SPARK-12544
> URL: https://issues.apache.org/jira/browse/SPARK-12544
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Herman van Hovell
>  Labels: releasenotes
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7367) spark-submit CLI --help -h overrides the application arguments

2016-02-12 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145651#comment-15145651
 ] 

Marcelo Vanzin commented on SPARK-7367:
---

I think this bug has been fixed since 1.4

help in app args:
{noformat}
$ spark-submit --master local --class MyClass /my.jar --help
Exception in thread "main" scala.MatchError: List(--help) (of class 
scala.collection.immutable.$colon$colon)
{noformat}


help in spark args:
{noformat}
$ spark-submit --master local --class MyClass --help /my.jar
Usage: spark-submit [options]  [app arguments]
...
{noformat}


> spark-submit CLI --help -h overrides the application arguments
> --
>
> Key: SPARK-7367
> URL: https://issues.apache.org/jira/browse/SPARK-7367
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Reporter: Gianmario Spacagna
>Priority: Minor
>
> The spark-submit script will parse the --help argument even if is provided as 
> application argument.
> E.g. 
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar --help
> or
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -h
> If my application is using a parsing library, such as Scallop, then it will 
> never be able to run the application with --help as argument.
> I think the spark-submit script should only print the help message when is 
> provided as single argument like this:
> spark-submit --help
> or it should provide a separator for trailing arguments:
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -- --help --arg1 --arg2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13257) Refine naive Bayes example code

2016-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145720#comment-15145720
 ] 

Apache Spark commented on SPARK-13257:
--

User 'movelikeriver' has created a pull request for this issue:
https://github.com/apache/spark/pull/11192

> Refine naive Bayes example code
> ---
>
> Key: SPARK-13257
> URL: https://issues.apache.org/jira/browse/SPARK-13257
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Lenjoy Lin
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> 1. Add code to check model after loading it
> 2. It's nice if the usage command line can be added into the comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12154) Upgrade to Jersey 2

2016-02-12 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145752#comment-15145752
 ] 

Matt Cheah commented on SPARK-12154:


Sorry this had to be pushed back - but I'll work on it in the coming week.

> Upgrade to Jersey 2
> ---
>
> Key: SPARK-12154
> URL: https://issues.apache.org/jira/browse/SPARK-12154
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Affects Versions: 1.5.2
>Reporter: Matt Cheah
>
> Fairly self-explanatory, Jersey 1 is a bit old and could use an upgrade. 
> Library conflicts for Jersey are difficult to workaround - see discussion on 
> SPARK-11081. It's easier to upgrade Jersey entirely, but we should target 
> Spark 2.0 since this may be a break for users who were using Jersey 1 in 
> their Spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2016-02-12 Thread Bryan Cutler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated SPARK-10086:
-
Attachment: flakyRepro.py

Simple script with similar operations to this StreamingKMeans test, used to 
reproduce the issue

> Flaky StreamingKMeans test in PySpark
> -
>
> Key: SPARK-10086
> URL: https://issues.apache.org/jira/browse/SPARK-10086
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark, Streaming, Tests
>Affects Versions: 1.5.0
>Reporter: Joseph K. Bradley
>Priority: Critical
> Attachments: flakyRepro.py
>
>
> Here's a report on investigating test failures in StreamingKMeans in PySpark. 
> (See Jenkins links below.)
> It is a StreamingKMeans test which trains on a DStream with 2 batches and 
> then tests on those same 2 batches.  It fails here: 
> [https://github.com/apache/spark/blob/1968276af0f681fe51328b7dd795bd21724a5441/python/pyspark/mllib/tests.py#L1144]
> I recreated the same test, with variants training on: (1) the original 2 
> batches, (2) just the first batch, (3) just the second batch, and (4) neither 
> batch.  Here is code which avoids Streaming altogether to identify what 
> batches were processed.
> {code}
> from pyspark.mllib.clustering import StreamingKMeans, StreamingKMeansModel
> batches = [[[-0.5], [0.6], [0.8]], [[0.2], [-0.1], [0.3]]]
> batches = [sc.parallelize(batch) for batch in batches]
> stkm = StreamingKMeans(decayFactor=0.0, k=2)
> stkm.setInitialCenters([[0.0], [1.0]], [1.0, 1.0])
> # Train
> def update(rdd):
> stkm._model.update(rdd, stkm._decayFactor, stkm._timeUnit)
> # Remove one or both of these lines to test skipping batches.
> update(batches[0])
> update(batches[1])
> # Test
> def predict(rdd):
> return stkm._model.predict(rdd)
> predict(batches[0]).collect()
> predict(batches[1]).collect()
> {code}
> *Results*:
> {code}
> ### EXPECTED
> [0, 1, 1] 
>   
> [1, 0, 1]
> ### Skip batch 0
> [1, 0, 0]
> [0, 1, 0]
> ### Skip batch 1
> [0, 1, 1]
> [1, 0, 1]
> ### Skip both batches  (This is what we see in the test 
> failures.)
> [0, 1, 1]
> [0, 0, 0]
> {code}
> Skipping both batches reproduces the failure.  There is no randomness in the 
> StreamingKMeans algorithm (since initial centers are fixed, not randomized).
> CC: [~tdas] [~freeman-lab] [~mengxr]
> Failure message:
> {code}
> ==
> FAIL: test_trainOn_predictOn (__main__.StreamingKMeansTest)
> Test that prediction happens on the updated model.
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@3/python/pyspark/mllib/tests.py",
>  line 1147, in test_trainOn_predictOn
> self._eventually(condition, catch_assertions=True)
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@3/python/pyspark/mllib/tests.py",
>  line 123, in _eventually
> raise lastValue
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@3/python/pyspark/mllib/tests.py",
>  line 114, in _eventually
> lastValue = condition()
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@3/python/pyspark/mllib/tests.py",
>  line 1144, in condition
> self.assertEqual(predict_results, [[0, 1, 1], [1, 0, 1]])
> AssertionError: Lists differ: [[0, 1, 1], [0, 0, 0]] != [[0, 1, 1], [1, 0, 1]]
> First differing element 1:
> [0, 0, 0]
> [1, 0, 1]
> - [[0, 1, 1], [0, 0, 0]]
> ? 
> + [[0, 1, 1], [1, 0, 1]]
> ?  +++   ^
> --
> Ran 62 tests in 164.188s
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2016-02-12 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145628#comment-15145628
 ] 

Bryan Cutler edited comment on SPARK-10086 at 2/13/16 12:44 AM:


Simple script [^flakyRepro.py] with similar operations to this StreamingKMeans 
test, used to reproduce the issue


was (Author: bryanc):
Simple script with similar operations to this StreamingKMeans test, used to 
reproduce the issue

> Flaky StreamingKMeans test in PySpark
> -
>
> Key: SPARK-10086
> URL: https://issues.apache.org/jira/browse/SPARK-10086
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark, Streaming, Tests
>Affects Versions: 1.5.0
>Reporter: Joseph K. Bradley
>Priority: Critical
> Attachments: flakyRepro.py
>
>
> Here's a report on investigating test failures in StreamingKMeans in PySpark. 
> (See Jenkins links below.)
> It is a StreamingKMeans test which trains on a DStream with 2 batches and 
> then tests on those same 2 batches.  It fails here: 
> [https://github.com/apache/spark/blob/1968276af0f681fe51328b7dd795bd21724a5441/python/pyspark/mllib/tests.py#L1144]
> I recreated the same test, with variants training on: (1) the original 2 
> batches, (2) just the first batch, (3) just the second batch, and (4) neither 
> batch.  Here is code which avoids Streaming altogether to identify what 
> batches were processed.
> {code}
> from pyspark.mllib.clustering import StreamingKMeans, StreamingKMeansModel
> batches = [[[-0.5], [0.6], [0.8]], [[0.2], [-0.1], [0.3]]]
> batches = [sc.parallelize(batch) for batch in batches]
> stkm = StreamingKMeans(decayFactor=0.0, k=2)
> stkm.setInitialCenters([[0.0], [1.0]], [1.0, 1.0])
> # Train
> def update(rdd):
> stkm._model.update(rdd, stkm._decayFactor, stkm._timeUnit)
> # Remove one or both of these lines to test skipping batches.
> update(batches[0])
> update(batches[1])
> # Test
> def predict(rdd):
> return stkm._model.predict(rdd)
> predict(batches[0]).collect()
> predict(batches[1]).collect()
> {code}
> *Results*:
> {code}
> ### EXPECTED
> [0, 1, 1] 
>   
> [1, 0, 1]
> ### Skip batch 0
> [1, 0, 0]
> [0, 1, 0]
> ### Skip batch 1
> [0, 1, 1]
> [1, 0, 1]
> ### Skip both batches  (This is what we see in the test 
> failures.)
> [0, 1, 1]
> [0, 0, 0]
> {code}
> Skipping both batches reproduces the failure.  There is no randomness in the 
> StreamingKMeans algorithm (since initial centers are fixed, not randomized).
> CC: [~tdas] [~freeman-lab] [~mengxr]
> Failure message:
> {code}
> ==
> FAIL: test_trainOn_predictOn (__main__.StreamingKMeansTest)
> Test that prediction happens on the updated model.
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@3/python/pyspark/mllib/tests.py",
>  line 1147, in test_trainOn_predictOn
> self._eventually(condition, catch_assertions=True)
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@3/python/pyspark/mllib/tests.py",
>  line 123, in _eventually
> raise lastValue
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@3/python/pyspark/mllib/tests.py",
>  line 114, in _eventually
> lastValue = condition()
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder@3/python/pyspark/mllib/tests.py",
>  line 1144, in condition
> self.assertEqual(predict_results, [[0, 1, 1], [1, 0, 1]])
> AssertionError: Lists differ: [[0, 1, 1], [0, 0, 0]] != [[0, 1, 1], [1, 0, 1]]
> First differing element 1:
> [0, 0, 0]
> [1, 0, 1]
> - [[0, 1, 1], [0, 0, 0]]
> ? 
> + [[0, 1, 1], [1, 0, 1]]
> ?  +++   ^
> --
> Ran 62 tests in 164.188s
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13293) Generate code for Expand

2016-02-12 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13293.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Generate code for Expand
> 
>
> Key: SPARK-13293
> URL: https://issues.apache.org/jira/browse/SPARK-13293
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13308) ManagedBuffers passed to OneToOneStreamManager need to be freed in non-error cases

2016-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13308:


Assignee: Apache Spark  (was: Josh Rosen)

> ManagedBuffers passed to OneToOneStreamManager need to be freed in non-error 
> cases
> --
>
> Key: SPARK-13308
> URL: https://issues.apache.org/jira/browse/SPARK-13308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> Spark's OneToOneStreamManager does not free ManagedBuffers that are passed to 
> it except in certain error cases. Instead, ManagedBuffers should be freed 
> once messages created from them are consumed and destroyed by lower layers of 
> the Netty networking code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13308) ManagedBuffers passed to OneToOneStreamManager need to be freed in non-error cases

2016-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145727#comment-15145727
 ] 

Apache Spark commented on SPARK-13308:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/11193

> ManagedBuffers passed to OneToOneStreamManager need to be freed in non-error 
> cases
> --
>
> Key: SPARK-13308
> URL: https://issues.apache.org/jira/browse/SPARK-13308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> Spark's OneToOneStreamManager does not free ManagedBuffers that are passed to 
> it except in certain error cases. Instead, ManagedBuffers should be freed 
> once messages created from them are consumed and destroyed by lower layers of 
> the Netty networking code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13308) ManagedBuffers passed to OneToOneStreamManager need to be freed in non-error cases

2016-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13308:


Assignee: Josh Rosen  (was: Apache Spark)

> ManagedBuffers passed to OneToOneStreamManager need to be freed in non-error 
> cases
> --
>
> Key: SPARK-13308
> URL: https://issues.apache.org/jira/browse/SPARK-13308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> Spark's OneToOneStreamManager does not free ManagedBuffers that are passed to 
> it except in certain error cases. Instead, ManagedBuffers should be freed 
> once messages created from them are consumed and destroyed by lower layers of 
> the Netty networking code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12154) Upgrade to Jersey 2

2016-02-12 Thread Milad Khajavi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145750#comment-15145750
 ] 

Milad Khajavi commented on SPARK-12154:
---

Hmm,
Good point for changing pom version and checking the tests. How much time
do we have? I think in following week I can work on it.





-- 
Milād Khājavi
http://blog.khajavi.ir
Having the source means you can do it yourself.
I tried to change the world, but I couldn’t find the source code.


> Upgrade to Jersey 2
> ---
>
> Key: SPARK-12154
> URL: https://issues.apache.org/jira/browse/SPARK-12154
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Affects Versions: 1.5.2
>Reporter: Matt Cheah
>
> Fairly self-explanatory, Jersey 1 is a bit old and could use an upgrade. 
> Library conflicts for Jersey are difficult to workaround - see discussion on 
> SPARK-11081. It's easier to upgrade Jersey entirely, but we should target 
> Spark 2.0 since this may be a break for users who were using Jersey 1 in 
> their Spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7367) spark-submit CLI --help -h overrides the application arguments

2016-02-12 Thread bimal tandel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145766#comment-15145766
 ] 

bimal tandel commented on SPARK-7367:
-

I cant reproduce this on the latest release. This jira can be closed.

> spark-submit CLI --help -h overrides the application arguments
> --
>
> Key: SPARK-7367
> URL: https://issues.apache.org/jira/browse/SPARK-7367
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Reporter: Gianmario Spacagna
>Priority: Minor
>
> The spark-submit script will parse the --help argument even if is provided as 
> application argument.
> E.g. 
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar --help
> or
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -h
> If my application is using a parsing library, such as Scallop, then it will 
> never be able to run the application with --help as argument.
> I think the spark-submit script should only print the help message when is 
> provided as single argument like this:
> spark-submit --help
> or it should provide a separator for trailing arguments:
> spark-submit --master local[*] --driver-memory 4G --class bar.foo.MyClass 
> /MyLocalJAR.jar -- --help --arg1 --arg2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13297) [SQL] Backticks cannot be escaped in column names

2016-02-12 Thread Xiu (Joe) Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145566#comment-15145566
 ] 

Xiu (Joe) Guo commented on SPARK-13297:
---

Looks like in the current [master 
branch|https://github.com/apache/spark/tree/42d656814f756599a2bc426f0e1f32bd4cc4470f],
 this problem is fixed.

{code}
scala> val columnName = "col`s"
columnName: String = col`s

scala> val rows = List(Row("foo"), Row("bar"))
rows: List[org.apache.spark.sql.Row] = List([foo], [bar])

scala> val schema = StructType(Seq(StructField(columnName, StringType)))
schema: org.apache.spark.sql.types.StructType = 
StructType(StructField(col`s,StringType,true))

scala> val rdd = sc.parallelize(rows)
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
ParallelCollectionRDD[0] at parallelize at :28

scala> val df = sqlContext.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [col`s: string]

scala> val selectingColumnName = "`" + columnName.replace("`", "``") + "`"
selectingColumnName: String = `col``s`

scala> selectingColumnName
res0: String = `col``s`

scala> val selectedDf = df.selectExpr(selectingColumnName)
selectedDf: org.apache.spark.sql.DataFrame = [col`s: string]

scala> selectedDf.show
+-+
|col`s|
+-+
|  foo|
|  bar|
+-+
{code}

> [SQL] Backticks cannot be escaped in column names
> -
>
> Key: SPARK-13297
> URL: https://issues.apache.org/jira/browse/SPARK-13297
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Grzegorz Chilkiewicz
>Priority: Minor
>
> We want to use backticks to escape spaces & minus signs in column names.
> Are we unable to escape backticks when a column name is surrounded by 
> backticks?
> It is not documented in: 
> http://spark.apache.org/docs/latest/sql-programming-guide.html
> In MySQL there is a way: double the backticks, but this trick doesn't work in 
> Spark-SQL.
> Am I correct or just missing something? Is there a way to escape backticks 
> inside a column name when it is surrounded by backticks?
> Code to reproduce the problem:
> https://github.com/grzegorz-chilkiewicz/SparkSqlEscapeBacktick



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2016-02-12 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145625#comment-15145625
 ] 

Bryan Cutler commented on SPARK-10086:
--

I was able to track down the cause of these failures, so here is an update with 
what I found.  The test {{StreamingKMeansTest.test_trainOn_predictOn}} has 2 
{{DStream.foreachRDD}} output operations, 1 in the call to 
{{StreamingKMeans.trainOn}} and 1 with {{collect}} which has a parent 
{{DStream}} that is a {{PythonTransformedDStream}} returned from 
{{StreamingKMeans.predictOn}}, so 2 jobs are generated for each batch.  When 
the {{DStream}} jobs are generated, there is nothing to compute for the first 
job, which updates the model.  For generating the second job, 
{{PythonTransformedDStream.compute}} gets called which will then do a 
{{PythonTransformFunction}} callback that creates a {{PythonRDD}} and 
serializes the mapped predict function to a command, containing the current 
model.  

Next, the 2 jobs are scheduled in order - first to update the model and then 
collect the predicted result.  At this point, there is a race condition between 
completing the model update and generating the next set of jobs, which is 
running in a different thread.  If there is enough of a delay in the update, 
then the next set of jobs will be generated and the old model will be 
serialized to the {{PythonRDD}} command again.  Finally, the predict will be 
run against this old model causing the test failure.  

To sum it up, the underlying issue is that a func can be serialized with a 
value before a job is run that updates this value.  This doesn't appear to be 
an issue in the Scala code as the closure cleaner is run just before the job is 
executed, and it will get the updated values.

So far, the best solution I can think of would be to somehow delay the 
serialization of the model until it is needed, but I believe this would involve 
some big changes in {{PythonRDD}} as would any other solutions I could think 
of.  Is something that would be worth doing to correct this, or might there be 
an easier fix that I am not seeing?  It's not just a {{StreamingKMeans}} issue, 
so it would affect any PySpark streaming application with similar structure.  

I am attaching some simplified code used to reproduce the issue.  I also have a 
similar Scala version that produces the expected results.

> Flaky StreamingKMeans test in PySpark
> -
>
> Key: SPARK-10086
> URL: https://issues.apache.org/jira/browse/SPARK-10086
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark, Streaming, Tests
>Affects Versions: 1.5.0
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> Here's a report on investigating test failures in StreamingKMeans in PySpark. 
> (See Jenkins links below.)
> It is a StreamingKMeans test which trains on a DStream with 2 batches and 
> then tests on those same 2 batches.  It fails here: 
> [https://github.com/apache/spark/blob/1968276af0f681fe51328b7dd795bd21724a5441/python/pyspark/mllib/tests.py#L1144]
> I recreated the same test, with variants training on: (1) the original 2 
> batches, (2) just the first batch, (3) just the second batch, and (4) neither 
> batch.  Here is code which avoids Streaming altogether to identify what 
> batches were processed.
> {code}
> from pyspark.mllib.clustering import StreamingKMeans, StreamingKMeansModel
> batches = [[[-0.5], [0.6], [0.8]], [[0.2], [-0.1], [0.3]]]
> batches = [sc.parallelize(batch) for batch in batches]
> stkm = StreamingKMeans(decayFactor=0.0, k=2)
> stkm.setInitialCenters([[0.0], [1.0]], [1.0, 1.0])
> # Train
> def update(rdd):
> stkm._model.update(rdd, stkm._decayFactor, stkm._timeUnit)
> # Remove one or both of these lines to test skipping batches.
> update(batches[0])
> update(batches[1])
> # Test
> def predict(rdd):
> return stkm._model.predict(rdd)
> predict(batches[0]).collect()
> predict(batches[1]).collect()
> {code}
> *Results*:
> {code}
> ### EXPECTED
> [0, 1, 1] 
>   
> [1, 0, 1]
> ### Skip batch 0
> [1, 0, 0]
> [0, 1, 0]
> ### Skip batch 1
> [0, 1, 1]
> [1, 0, 1]
> ### Skip both batches  (This is what we see in the test 
> failures.)
> [0, 1, 1]
> [0, 0, 0]
> {code}
> Skipping both batches reproduces the failure.  There is no randomness in the 
> StreamingKMeans algorithm (since initial centers are fixed, not randomized).
> CC: [~tdas] [~freeman-lab] [~mengxr]
> Failure message:
> {code}
> ==
> FAIL: test_trainOn_predictOn (__main__.StreamingKMeansTest)
> Test that prediction happens on the updated model.
> 

[jira] [Created] (SPARK-13308) ManagedBuffers passed to OneToOneStreamManager need to be freed in non-error cases

2016-02-12 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-13308:
--

 Summary: ManagedBuffers passed to OneToOneStreamManager need to be 
freed in non-error cases
 Key: SPARK-13308
 URL: https://issues.apache.org/jira/browse/SPARK-13308
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Josh Rosen
Assignee: Josh Rosen


Spark's OneToOneStreamManager does not free ManagedBuffers that are passed to 
it except in certain error cases. Instead, ManagedBuffers should be freed once 
messages created from them are consumed and destroyed by lower layers of the 
Netty networking code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12154) Upgrade to Jersey 2

2016-02-12 Thread Andrew Ash (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ash updated SPARK-12154:
---
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-11806

> Upgrade to Jersey 2
> ---
>
> Key: SPARK-12154
> URL: https://issues.apache.org/jira/browse/SPARK-12154
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Affects Versions: 1.5.2
>Reporter: Matt Cheah
>
> Fairly self-explanatory, Jersey 1 is a bit old and could use an upgrade. 
> Library conflicts for Jersey are difficult to workaround - see discussion on 
> SPARK-11081. It's easier to upgrade Jersey entirely, but we should target 
> Spark 2.0 since this may be a break for users who were using Jersey 1 in 
> their Spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12154) Upgrade to Jersey 2

2016-02-12 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145746#comment-15145746
 ] 

Andrew Ash commented on SPARK-12154:


[~khajavi] would you please give it a go?  [~mcheah] must have been busy over 
the past couple weeks.

I think the way to get started would be to change jersey.version from 1.9 to 
the new version in the main {{/pom.xml}} and work through the resulting 
compile/test failures.

These sections of the official Jersey documentation might be useful as you get 
started:
https://jersey.java.net/documentation/latest/migration.html#mig-1.x
https://jersey.java.net/nonav/documentation/2.0/migration.html

> Upgrade to Jersey 2
> ---
>
> Key: SPARK-12154
> URL: https://issues.apache.org/jira/browse/SPARK-12154
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Affects Versions: 1.5.2
>Reporter: Matt Cheah
>
> Fairly self-explanatory, Jersey 1 is a bit old and could use an upgrade. 
> Library conflicts for Jersey are difficult to workaround - see discussion on 
> SPARK-11081. It's easier to upgrade Jersey entirely, but we should target 
> Spark 2.0 since this may be a break for users who were using Jersey 1 in 
> their Spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2