date:20140802

[jira] [Resolved] (SPARK-2728) Integer overflow in partition index calculation RangePartitioner

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2728.


Resolution: Not a Problem

Closing as per [~srowen]

> Integer overflow in partition index calculation RangePartitioner
> 
>
> Key: SPARK-2728
> URL: https://issues.apache.org/jira/browse/SPARK-2728
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
> Environment: Spark 1.0.1
>Reporter: Jianshi Huang
>  Labels: easyfix
>
> If the partition number are greater than 10362, then spark will report 
> ArrayOutofIndex error. 
> The reason is in the partition index calculation in rangeBounds:
> #Line: 112
> val bounds = new Array[K](partitions - 1)
> for (i <- 0 until partitions - 1) {
>   val index = (rddSample.length - 1) * (i + 1) / partitions
>   bounds(i) = rddSample(index)
> }
> Here (rddSample.length - 1) * (i + 1) will overflow to a negative Int.
> Cast rddSample.length - 1 to Long should be enough for a fix?
> Jianshi



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2382) build error:

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2382.


Resolution: Not a Problem

Closing as per [~srowen]

> build error: 
> -
>
> Key: SPARK-2382
> URL: https://issues.apache.org/jira/browse/SPARK-2382
> Project: Spark
>  Issue Type: Question
>  Components: Build
>Affects Versions: 1.0.0
> Environment: Ubuntu 12.0.4 precise. 
> spark@ubuntu-cdh5-spark:~/spark-1.0.0$ mvn -version
> Apache Maven 3.0.4
> Maven home: /usr/share/maven
> Java version: 1.6.0_31, vendor: Sun Microsystems Inc.
> Java home: /usr/lib/jvm/j2sdk1.6-oracle/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "3.11.0-15-generic", arch: "amd64", family: "unix"
>Reporter: Mukul Jain
>  Labels: newbie
>
> Unable to build. maven can't download dependency .. checked my http_proxy and 
> https_proxy setting they are working fine. Other http and https dependencies 
> were downloaded fine.. build process gets stuck always at this repo. manually 
> down loading also fails and receive an exception. 
> [INFO] 
> 
> [INFO] Building Spark Project External MQTT 1.0.0
> [INFO] 
> 
> Downloading: 
> https://repository.apache.org/content/repositories/releases/org/eclipse/paho/mqtt-client/0.4.0/mqtt-client-0.4.0.pom
> Jul 6, 2014 4:53:26 PM org.apache.commons.httpclient.HttpMethodDirector 
> executeWithRetry
> INFO: I/O exception (java.net.ConnectException) caught when processing 
> request: Connection timed out
> Jul 6, 2014 4:53:26 PM org.apache.commons.httpclient.HttpMethodDirector 
> executeWithRetry
> INFO: Retrying request



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2354) BitSet Range Expanded when creating new one

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2354.


Resolution: Not a Problem

Closing per [~srowen]

> BitSet Range Expanded when creating new one
> ---
>
> Key: SPARK-2354
> URL: https://issues.apache.org/jira/browse/SPARK-2354
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Yijie Shen
>Priority: Minor
>
> BitSet has a constructor parameter named "numBits: Int" and indicate the bit 
> num inside.
> And also, there is a function called "capacity" which represents the long 
> words number to hold the bits.
> When creating new BitSet,for example in '|', I thought the new created one 
> shouldn't be the size of longer words' length, instead, it should be the 
> longer set's num of bit
> {code}def |(other: BitSet): BitSet = {
> val newBS = new BitSet(math.max(numBits, other.numBits)) 
> // I know by now the numBits isn't a field
> {code}
> Does it have any other reason to expand the BitSet range I don't know?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2573) This file "make-distribution.sh" has an error, please fix it

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2573.


Resolution: Fixed

> This file "make-distribution.sh" has an error, please fix it
> 
>
> Key: SPARK-2573
> URL: https://issues.apache.org/jira/browse/SPARK-2573
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.0
>Reporter: 王金子
>  Labels: build
>
> line 61: 
> echo "Error: '--with-hive' is no longer supported, use Maven option -Pyarn".
> It should be use -Phive.Otherwise, i like the oldest more,why update it



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2742) The variable inputFormatInfo and inputFormatMap never used

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2742.


Resolution: Fixed

> The variable inputFormatInfo and inputFormatMap never used
> --
>
> Key: SPARK-2742
> URL: https://issues.apache.org/jira/browse/SPARK-2742
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Reporter: meiyoula
>Priority: Minor
>
> the ClientArguments class has two never used variables, one is 
> inputFormatInfo, the other is inputFormatMap



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2414) Remove jquery

2014-08-02 Thread Patrick Wendell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083867#comment-14083867
 ] 

Patrick Wendell commented on SPARK-2414:


I've merged Sean's PR to address the license coverage:
https://github.com/apache/spark/pull/1748

> Remove jquery
> -
>
> Key: SPARK-2414
> URL: https://issues.apache.org/jira/browse/SPARK-2414
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Reynold Xin
>Priority: Minor
>  Labels: starter
>
> SPARK-2384 introduces jquery for tooltip display. We can probably just create 
> a very simple javascript for tooltip instead of pulling in jquery. 
> https://github.com/apache/spark/pull/1314



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2627) Check for PEP 8 compliance on all Python code in the Jenkins CI cycle

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2627:
---

Assignee: Nicholas Chammas

> Check for PEP 8 compliance on all Python code in the Jenkins CI cycle
> -
>
> Key: SPARK-2627
> URL: https://issues.apache.org/jira/browse/SPARK-2627
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> This issue was triggered by [the discussion 
> here|https://github.com/apache/spark/pull/1505#issuecomment-49698681].
> Requirements:
> * make a linter script for Scala under {{dev/lint-scala}} that just calls 
> {{scalastyle}}
> * make a linter script for Python under {{dev/lint-python}} that calls 
> {{pep8}} on all Python files
> ** One exception to this is {{cloudpickle.py}}, which is a third-party module 
> [we don't want to 
> touch|https://github.com/apache/spark/pull/1505#discussion-diff-15197904]
> * Modify {{dev/run-tests}} to call both linter scripts
> * Incorporate these changes into the [Contributing to 
> Spark|https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark]
>  guide



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2602) sbt/sbt test steals window focus on OS X

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2602.


   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 1747
[https://github.com/apache/spark/pull/1747]

> sbt/sbt test steals window focus on OS X
> 
>
> Key: SPARK-2602
> URL: https://issues.apache.org/jira/browse/SPARK-2602
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 1.1.0
>
>
> On OS X, I run {{sbt/sbt test}} from Terminal and then go off and do 
> something else with my computer. It appears that there are several things in 
> the test suite that launch Java programs that, for some reason, steal window 
> focus. 
> It can get very annoying, especially if you happen to be typing something in 
> a different window, to be suddenly teleported to a random Java application 
> and have your finely crafted keystrokes be sent where they weren't intended.
> It would be nice if {{sbt/sbt test}} didn't do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2602) sbt/sbt test steals window focus on OS X

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2602:
---

Assignee: Sean Owen

> sbt/sbt test steals window focus on OS X
> 
>
> Key: SPARK-2602
> URL: https://issues.apache.org/jira/browse/SPARK-2602
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Sean Owen
>Priority: Minor
>
> On OS X, I run {{sbt/sbt test}} from Terminal and then go off and do 
> something else with my computer. It appears that there are several things in 
> the test suite that launch Java programs that, for some reason, steal window 
> focus. 
> It can get very annoying, especially if you happen to be typing something in 
> a different window, to be suddenly teleported to a random Java application 
> and have your finely crafted keystrokes be sent where they weren't intended.
> It would be nice if {{sbt/sbt test}} didn't do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2813) Implement SQRT() directly in Spark SQP

2014-08-02 Thread William Benton (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Benton updated SPARK-2813:
--

Summary: Implement SQRT() directly in Spark SQP  (was: Implement SQRT() 
directly in Catalyst)

> Implement SQRT() directly in Spark SQP
> --
>
> Key: SPARK-2813
> URL: https://issues.apache.org/jira/browse/SPARK-2813
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: William Benton
>Priority: Minor
> Fix For: 1.1.0
>
>
> Instead of delegating square root computation to a Hive UDF, Spark should 
> implement SQL SQRT() directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2813) Implement SQRT() directly in Spark SQL

2014-08-02 Thread William Benton (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Benton updated SPARK-2813:
--

Summary: Implement SQRT() directly in Spark SQL  (was: Implement SQRT() 
directly in Spark SQP)

> Implement SQRT() directly in Spark SQL
> --
>
> Key: SPARK-2813
> URL: https://issues.apache.org/jira/browse/SPARK-2813
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: William Benton
>Priority: Minor
> Fix For: 1.1.0
>
>
> Instead of delegating square root computation to a Hive UDF, Spark should 
> implement SQL SQRT() directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2813) Implement SQRT() directly in Catalyst

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083862#comment-14083862
 ] 

Apache Spark commented on SPARK-2813:
-

User 'willb' has created a pull request for this issue:
https://github.com/apache/spark/pull/1750

> Implement SQRT() directly in Catalyst
> -
>
> Key: SPARK-2813
> URL: https://issues.apache.org/jira/browse/SPARK-2813
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: William Benton
>Priority: Minor
> Fix For: 1.1.0
>
>
> Instead of delegating square root computation to a Hive UDF, Spark should 
> implement SQL SQRT() directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2813) Implement SQRT() directly in Catalyst

2014-08-02 Thread William Benton (JIRA)

William Benton created SPARK-2813:
-

 Summary: Implement SQRT() directly in Catalyst
 Key: SPARK-2813
 URL: https://issues.apache.org/jira/browse/SPARK-2813
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.0.0
Reporter: William Benton
Priority: Minor
 Fix For: 1.1.0


Instead of delegating square root computation to a Hive UDF, Spark should 
implement SQL SQRT() directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2797) SchemaRDDs don't support unpersist()

2014-08-02 Thread Nicholas Chammas (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083857#comment-14083857
 ] 

Nicholas Chammas commented on SPARK-2797:
-

{quote}
I think we also need to refactoring the sql.py later. Some variable names are 
misleading (For example, _jschema_rdd is a Scala SchemaRDD instead of a 
JavaSchemaRDD).
{quote}

Yin, is there a JIRA to track that? (Do we need one?)

By the way, thank you for resolving this so quickly.

> SchemaRDDs don't support unpersist()
> 
>
> Key: SPARK-2797
> URL: https://issues.apache.org/jira/browse/SPARK-2797
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.0.2
>Reporter: Nicholas Chammas
>Assignee: Yin Huai
> Fix For: 1.1.0
>
>
> Looks like something simple got missed in the Java layer?
> {code}
> >>> from pyspark.sql import SQLContext
> >>> sqlContext = SQLContext(sc)
> >>> raw = sc.parallelize(['{"a": 5}'])
> >>> events = sqlContext.jsonRDD(raw)
> >>> events.printSchema()
> root
>  |-- a: IntegerType
> >>> events.cache()
> PythonRDD[45] at RDD at PythonRDD.scala:37
> >>> events.unpersist()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/root/spark/python/pyspark/sql.py", line 440, in unpersist
> self._jschema_rdd.unpersist()
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 
> 537, in __call__
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 
> 304, in get_return_value
> py4j.protocol.Py4JError: An error occurred while calling o108.unpersist. 
> Trace:
> py4j.Py4JException: Method unpersist([]) does not exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>   at py4j.Gateway.invoke(Gateway.java:251)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> >>> events.unpersist
>  PythonRDD.scala:37>
> {code}
> Note that the {{unpersist}} method exists but cannot be called without 
> raising the shown error.
> This is on {{1.0.2-rc1}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large

2014-08-02 Thread Carlos Fuertes (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083852#comment-14083852
 ] 

Carlos Fuertes commented on SPARK-2017:
---

I have added to PR [1682] a configuration property 
"spark.ui.jsRenderingEnabled" that controls whether the rendering of the tables 
happens using Javascript or not. It is enable by default. This ensures that 
people that cannot or do not want to run javascript to do the rendering, they 
can use the web ui as before.

> web ui stage page becomes unresponsive when the number of tasks is large
> 
>
> Key: SPARK-2017
> URL: https://issues.apache.org/jira/browse/SPARK-2017
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Reporter: Reynold Xin
>  Labels: starter
>
> {code}
> sc.parallelize(1 to 100, 100).count()
> {code}
> The above code creates one million tasks to be executed. The stage detail web 
> ui page takes forever to load (if it ever completes).
> There are again a few different alternatives:
> 0. Limit the number of tasks we show.
> 1. Pagination
> 2. By default only show the aggregate metrics and failed tasks, and hide the 
> successful ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083843#comment-14083843
 ] 

Anand Avati commented on SPARK-1997:


The reason for upgrade is not for 1.1. It is for Scala 2.11 porting. Only
0.8.1 breeze is available in Scala 2.11. So considering all else being
equal, this is a reason to move to 0.8.1 (at some point eventually).

This update is for master only, not to be back ported to branch-1.1

Thanks



> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1981) Add AWS Kinesis streaming support

2014-08-02 Thread Chris Fregly (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083836#comment-14083836
 ] 

Chris Fregly commented on SPARK-1981:
-

hey nick-

due to the Kinesis Client Library's ASL license restriction, we ended up 
isolating all kinesis-related code to the extras/kinesis-asl module.

this module can be activated at build time by including -Pkinesis-asl in either 
sbt or maven.

this is all documented here, btw:  
https://github.com/apache/spark/blob/master/docs/streaming-kinesis.md

looks like i messed up the markdown a bit.  whoops!  but the details are all 
there.  i'll try to clean that up.

> Add AWS Kinesis streaming support
> -
>
> Key: SPARK-1981
> URL: https://issues.apache.org/jira/browse/SPARK-1981
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Chris Fregly
>Assignee: Chris Fregly
> Fix For: 1.1.0
>
>
> Add AWS Kinesis support to Spark Streaming.
> Initial discussion occured here:  https://github.com/apache/spark/pull/223
> I discussed this with Parviz from AWS recently and we agreed that I would 
> take this over.
> Look for a new PR that takes into account all the feedback from the earlier 
> PR including spark-1.0-compliant implementation, AWS-license-aware build 
> support, tests, comments, and style guide compliance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2481) The environment variables SPARK_HISTORY_OPTS is covered in start-history-server.sh

2014-08-02 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-2481:
---

Affects Version/s: 1.0.1
   1.0.0

> The environment variables SPARK_HISTORY_OPTS is covered in 
> start-history-server.sh
> --
>
> Key: SPARK-2481
> URL: https://issues.apache.org/jira/browse/SPARK-2481
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
>
> If we have the following code in the conf/spark-env.sh  
> {{export SPARK_HISTORY_OPTS="-DSpark.history.XX=XX"}}
> The environment variables SPARK_HISTORY_OPTS is covered in 
> [start-history-server.sh|https://github.com/apache/spark/blob/master/sbin/start-history-server.sh]
>  
> {code}
> if [ $# != 0 ]; then
>   echo "Using command line arguments for setting the log directory is 
> deprecated. Please "
>   echo "set the spark.history.fs.logDirectory configuration option instead."
>   export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS 
> -Dspark.history.fs.logDirectory=$1"
> fi
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-02 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083830#comment-14083830
 ] 

Xiangrui Meng commented on SPARK-1997:
--

I sent a PR to breeze: https://github.com/scalanlp/breeze/pull/288 . If David 
feels okay with the change and helps us cut a release, we should wait and use 
the new version.

Are we going to support Scala 2.11 in Spark v1.1? If not, the breeze version is 
not a blocker for v1.1. Neither breeze 0.7 nor 0.8.1 is a good candidate for 
v1.1 because of the scalalogging dependency, which I didn't notice until I 
merged #940. So I don't see why we need to upgrade to 0.8.1 at this time.

> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2739) Rename registerAsTable to registerTempTable

2014-08-02 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2739.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

> Rename registerAsTable to registerTempTable
> ---
>
> Key: SPARK-2739
> URL: https://issues.apache.org/jira/browse/SPARK-2739
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083824#comment-14083824
 ] 

Apache Spark commented on SPARK-1997:
-

User 'avati' has created a pull request for this issue:
https://github.com/apache/spark/pull/1749

> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083823#comment-14083823
 ] 

Anand Avati commented on SPARK-1997:


[~mengxr] I have re-posted the same #940 patch here - 
https://github.com/apache/spark/pull/1749

Please let us know if there are any further concerns.

> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2797) SchemaRDDs don't support unpersist()

2014-08-02 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2797.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

> SchemaRDDs don't support unpersist()
> 
>
> Key: SPARK-2797
> URL: https://issues.apache.org/jira/browse/SPARK-2797
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.0.2
>Reporter: Nicholas Chammas
>Assignee: Yin Huai
> Fix For: 1.1.0
>
>
> Looks like something simple got missed in the Java layer?
> {code}
> >>> from pyspark.sql import SQLContext
> >>> sqlContext = SQLContext(sc)
> >>> raw = sc.parallelize(['{"a": 5}'])
> >>> events = sqlContext.jsonRDD(raw)
> >>> events.printSchema()
> root
>  |-- a: IntegerType
> >>> events.cache()
> PythonRDD[45] at RDD at PythonRDD.scala:37
> >>> events.unpersist()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/root/spark/python/pyspark/sql.py", line 440, in unpersist
> self._jschema_rdd.unpersist()
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 
> 537, in __call__
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 
> 304, in get_return_value
> py4j.protocol.Py4JError: An error occurred while calling o108.unpersist. 
> Trace:
> py4j.Py4JException: Method unpersist([]) does not exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>   at py4j.Gateway.invoke(Gateway.java:251)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> >>> events.unpersist
>  PythonRDD.scala:37>
> {code}
> Note that the {{unpersist}} method exists but cannot be called without 
> raising the shown error.
> This is on {{1.0.2-rc1}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2805) update akka to version 2.3

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083813#comment-14083813
 ] 

Anand Avati commented on SPARK-2805:


Posted https://github.com/apache/spark/pull/1685

> update akka to version 2.3
> --
>
> Key: SPARK-2805
> URL: https://issues.apache.org/jira/browse/SPARK-2805
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> akka-2.3 is the lowest version available in Scala 2.11
> akka-2.3 depends on protobuf 2.5. Hadoop-1 requires protobuf 2.4.1. In order 
> to reconcile the conflicting dependencies, need to release 
> akka-2.3.x-shaded-protobuf artifact which has protobuf 2.5 within.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2806) update json4s-jackson to version 3.2.10

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083814#comment-14083814
 ] 

Anand Avati commented on SPARK-2806:


Posted https://github.com/apache/spark/pull/1702

> update json4s-jackson to version 3.2.10
> ---
>
> Key: SPARK-2806
> URL: https://issues.apache.org/jira/browse/SPARK-2806
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> json4s-jackson 3.2.6 is not available in Scala 2.11



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2810) update scala-maven-plugin to version 3.2.0

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083817#comment-14083817
 ] 

Anand Avati commented on SPARK-2810:


Posted https://github.com/apache/spark/pull/1711

> update scala-maven-plugin to version 3.2.0
> --
>
> Key: SPARK-2810
> URL: https://issues.apache.org/jira/browse/SPARK-2810
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> Needed for Scala 2.11 'compiler-interface'



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1812) Support cross-building with Scala 2.11

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083812#comment-14083812
 ] 

Anand Avati commented on SPARK-1812:


[~pwendell] I have created sub-tasks for the dependencies and build. All source 
code changes (warning and error fixes due to 2.11 language changes) will 
continue against this Jira.

> Support cross-building with Scala 2.11
> --
>
> Key: SPARK-1812
> URL: https://issues.apache.org/jira/browse/SPARK-1812
> Project: Spark
>  Issue Type: New Feature
>  Components: Build, Spark Core
>Reporter: Matei Zaharia
>Assignee: Prashant Sharma
>
> Since Scala 2.10/2.11 are source compatible, we should be able to cross build 
> for both versions. From what I understand there are basically three things we 
> need to figure out:
> 1. Have a two versions of our dependency graph, one that uses 2.11 
> dependencies and the other that uses 2.10 dependencies.
> 2. Figure out how to publish different poms for 2.10 and 2.11.
> I think (1) can be accomplished by having a scala 2.11 profile. (2) isn't 
> really well supported by Maven since published pom's aren't generated 
> dynamically. But we can probably script around it to make it work. I've done 
> some initial sanity checks with a simple build here:
> https://github.com/pwendell/scala-maven-crossbuild



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083811#comment-14083811
 ] 

Anand Avati commented on SPARK-1997:


Ah, you are right, scalalogging was removed, not updated.

In any case can we update to breeze 0.8.1 with the old #940 patch (the goal of 
this jira)? The logging changes you suggest to breeze seems orthogonal to 0.7 
vs 0.8.1?

> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-02 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083810#comment-14083810
 ] 

Xiangrui Meng commented on SPARK-1997:
--

I just checked breeze's dependency on scalalogging: 
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/util/SerializableLogging.scala

Maybe we can send a PR to breeze to implement something similar to Spark's 
Logging: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala

> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-1997) Update breeze to version 0.8.1

2014-08-02 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reopened SPARK-1997:
--


> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-02 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083808#comment-14083808
 ] 

Xiangrui Meng commented on SPARK-1997:
--

I think we removed scalalogging instead of upgrading it to 2.1.2: 
https://github.com/apache/spark/commit/4c477117bb1ffef463776c86f925d35036f96b7a

> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2812) convert maven to archetype based build

2014-08-02 Thread Anand Avati (JIRA)

Anand Avati created SPARK-2812:
--

 Summary: convert maven to archetype based build
 Key: SPARK-2812
 URL: https://issues.apache.org/jira/browse/SPARK-2812
 Project: Spark
  Issue Type: Sub-task
Reporter: Anand Avati


In order to support Scala 2.10 and 2.11 parallel builds.

Build profile in pom.xml is insufficient as it is not possible to have 
expressions/variables in artifact name of sub-modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2809) update chill to version 0.4

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083806#comment-14083806
 ] 

Anand Avati commented on SPARK-2809:


[~srowen], spark is currently using twitter chill 0.3.6. Scala 2.11 based chill 
is likely going to in version 0.4 (the latest 2.10 based release) or later, and 
therefore will need update of spark's pom.xml in any case.

> update chill to version 0.4
> ---
>
> Key: SPARK-2809
> URL: https://issues.apache.org/jira/browse/SPARK-2809
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> First twitter chill_2.11 0.4 has to be released



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2414) Remove jquery

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083805#comment-14083805
 ] 

Apache Spark commented on SPARK-2414:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/1748

> Remove jquery
> -
>
> Key: SPARK-2414
> URL: https://issues.apache.org/jira/browse/SPARK-2414
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Reynold Xin
>Priority: Minor
>  Labels: starter
>
> SPARK-2384 introduces jquery for tooltip display. We can probably just create 
> a very simple javascript for tooltip instead of pulling in jquery. 
> https://github.com/apache/spark/pull/1314



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2712) Add a small note that mvn "package" must happen before "test"

2014-08-02 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083804#comment-14083804
 ] 

Sean Owen commented on SPARK-2712:
--

The doc he links to does mention just that? I'm confused now too.

"Some of the require Spark to be packaged first, so always run mvn package with 
-DskipTests the first time. You can then run the tests with mvn 
-Dhadoop.version=... test."

> Add a small note that mvn "package" must happen before "test"
> -
>
> Key: SPARK-2712
> URL: https://issues.apache.org/jira/browse/SPARK-2712
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 0.9.1, 1.0.0, 1.1.1
> Environment: all
>Reporter: Stephen Boesch
>Assignee: Stephen Boesch
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.1.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add to the building-with-maven.md:
> Requirement: build packages before running tests
> Tests must be run AFTER the "package" target has already been executed. The 
> following is an example of a correct (build, test) sequence:
> mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive clean package
> mvn -Pyarn -Phadoop-2.3 -Phive test
> BTW Reynold Xin requested this tiny doc improvement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2602) sbt/sbt test steals window focus on OS X

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083802#comment-14083802
 ] 

Apache Spark commented on SPARK-2602:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/1747

> sbt/sbt test steals window focus on OS X
> 
>
> Key: SPARK-2602
> URL: https://issues.apache.org/jira/browse/SPARK-2602
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Priority: Minor
>
> On OS X, I run {{sbt/sbt test}} from Terminal and then go off and do 
> something else with my computer. It appears that there are several things in 
> the test suite that launch Java programs that, for some reason, steal window 
> focus. 
> It can get very annoying, especially if you happen to be typing something in 
> a different window, to be suddenly teleported to a random Java application 
> and have your finely crafted keystrokes be sent where they weren't intended.
> It would be nice if {{sbt/sbt test}} didn't do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2785) Avoid asserts for unimplemented hive features

2014-08-02 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2785.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

> Avoid asserts for unimplemented hive features
> -
>
> Key: SPARK-2785
> URL: https://issues.apache.org/jira/browse/SPARK-2785
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Aaron Davidson
>Assignee: Michael Armbrust
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2807) update akka-zeromq to version 2.3.4

2014-08-02 Thread Anand Avati (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Avati updated SPARK-2807:
---

Summary: update akka-zeromq to version 2.3.4  (was: update akka-zeromq_2.11 
to version 2.3.4)

> update akka-zeromq to version 2.3.4
> ---
>
> Key: SPARK-2807
> URL: https://issues.apache.org/jira/browse/SPARK-2807
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> First akka-zeromq_2.11 2.3.4 has to be released



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2811) update algebird to 0.7

2014-08-02 Thread Anand Avati (JIRA)

Anand Avati created SPARK-2811:
--

 Summary: update algebird to 0.7
 Key: SPARK-2811
 URL: https://issues.apache.org/jira/browse/SPARK-2811
 Project: Spark
  Issue Type: Sub-task
Reporter: Anand Avati


First algebird_2.11 0.7.0 has to be released



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2809) update chill to version 0.4

2014-08-02 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083798#comment-14083798
 ] 

Sean Owen commented on SPARK-2809:
--

Does it make sense to open all these JIRAs? chill doesn't exist for 2.11 as you 
say; it's not an actionable task here. There is an implicit need for all 
libraries the build uses to support Scala 2.11 but that can be a note in the 
parent?

> update chill to version 0.4
> ---
>
> Key: SPARK-2809
> URL: https://issues.apache.org/jira/browse/SPARK-2809
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> First twitter chill_2.11 0.4 has to be released



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2784) Make language configurable using SQLConf instead of hql/sql functions

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083797#comment-14083797
 ] 

Apache Spark commented on SPARK-2784:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/1746

> Make language configurable using SQLConf instead of hql/sql functions
> -
>
> Key: SPARK-2784
> URL: https://issues.apache.org/jira/browse/SPARK-2784
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2797) SchemaRDDs don't support unpersist()

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083796#comment-14083796
 ] 

Apache Spark commented on SPARK-2797:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/1745

> SchemaRDDs don't support unpersist()
> 
>
> Key: SPARK-2797
> URL: https://issues.apache.org/jira/browse/SPARK-2797
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.0.2
>Reporter: Nicholas Chammas
>Assignee: Yin Huai
>
> Looks like something simple got missed in the Java layer?
> {code}
> >>> from pyspark.sql import SQLContext
> >>> sqlContext = SQLContext(sc)
> >>> raw = sc.parallelize(['{"a": 5}'])
> >>> events = sqlContext.jsonRDD(raw)
> >>> events.printSchema()
> root
>  |-- a: IntegerType
> >>> events.cache()
> PythonRDD[45] at RDD at PythonRDD.scala:37
> >>> events.unpersist()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/root/spark/python/pyspark/sql.py", line 440, in unpersist
> self._jschema_rdd.unpersist()
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 
> 537, in __call__
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 
> 304, in get_return_value
> py4j.protocol.Py4JError: An error occurred while calling o108.unpersist. 
> Trace:
> py4j.Py4JException: Method unpersist([]) does not exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>   at py4j.Gateway.invoke(Gateway.java:251)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> >>> events.unpersist
>  PythonRDD.scala:37>
> {code}
> Note that the {{unpersist}} method exists but cannot be called without 
> raising the shown error.
> This is on {{1.0.2-rc1}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2810) update scala-maven-plugin to version 3.2.0

2014-08-02 Thread Anand Avati (JIRA)

Anand Avati created SPARK-2810:
--

 Summary: update scala-maven-plugin to version 3.2.0
 Key: SPARK-2810
 URL: https://issues.apache.org/jira/browse/SPARK-2810
 Project: Spark
  Issue Type: Sub-task
Reporter: Anand Avati


Needed for Scala 2.11 'compiler-interface'



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2097) UDF Support

2014-08-02 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2097.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

> UDF Support
> ---
>
> Key: SPARK-2097
> URL: https://issues.apache.org/jira/browse/SPARK-2097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
> Fix For: 1.1.0
>
>
> Right now we only support UDFs that are written against the Hive API or are 
> called directly as expressions in the DSL.  It would be nice to have native 
> support for registering scala/python functions as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2627) Check for PEP 8 compliance on all Python code in the Jenkins CI cycle

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083795#comment-14083795
 ] 

Apache Spark commented on SPARK-2627:
-

User 'nchammas' has created a pull request for this issue:
https://github.com/apache/spark/pull/1744

> Check for PEP 8 compliance on all Python code in the Jenkins CI cycle
> -
>
> Key: SPARK-2627
> URL: https://issues.apache.org/jira/browse/SPARK-2627
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> This issue was triggered by [the discussion 
> here|https://github.com/apache/spark/pull/1505#issuecomment-49698681].
> Requirements:
> * make a linter script for Scala under {{dev/lint-scala}} that just calls 
> {{scalastyle}}
> * make a linter script for Python under {{dev/lint-python}} that calls 
> {{pep8}} on all Python files
> ** One exception to this is {{cloudpickle.py}}, which is a third-party module 
> [we don't want to 
> touch|https://github.com/apache/spark/pull/1505#discussion-diff-15197904]
> * Modify {{dev/run-tests}} to call both linter scripts
> * Incorporate these changes into the [Contributing to 
> Spark|https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark]
>  guide



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2809) update chill to version 0.4

2014-08-02 Thread Anand Avati (JIRA)

Anand Avati created SPARK-2809:
--

 Summary: update chill to version 0.4
 Key: SPARK-2809
 URL: https://issues.apache.org/jira/browse/SPARK-2809
 Project: Spark
  Issue Type: Sub-task
Reporter: Anand Avati


First twitter chill_2.11 0.4 has to be released



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2797) SchemaRDDs don't support unpersist()

2014-08-02 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083791#comment-14083791
 ] 

Yin Huai commented on SPARK-2797:
-

I think we also need to refactoring the sql.py later. Some variable names are 
misleading (For example, _jschema_rdd is a Scala SchemaRDD instead of a 
JavaSchemaRDD).

> SchemaRDDs don't support unpersist()
> 
>
> Key: SPARK-2797
> URL: https://issues.apache.org/jira/browse/SPARK-2797
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.0.2
>Reporter: Nicholas Chammas
>
> Looks like something simple got missed in the Java layer?
> {code}
> >>> from pyspark.sql import SQLContext
> >>> sqlContext = SQLContext(sc)
> >>> raw = sc.parallelize(['{"a": 5}'])
> >>> events = sqlContext.jsonRDD(raw)
> >>> events.printSchema()
> root
>  |-- a: IntegerType
> >>> events.cache()
> PythonRDD[45] at RDD at PythonRDD.scala:37
> >>> events.unpersist()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/root/spark/python/pyspark/sql.py", line 440, in unpersist
> self._jschema_rdd.unpersist()
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 
> 537, in __call__
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 
> 304, in get_return_value
> py4j.protocol.Py4JError: An error occurred while calling o108.unpersist. 
> Trace:
> py4j.Py4JException: Method unpersist([]) does not exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>   at py4j.Gateway.invoke(Gateway.java:251)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> >>> events.unpersist
>  PythonRDD.scala:37>
> {code}
> Note that the {{unpersist}} method exists but cannot be called without 
> raising the shown error.
> This is on {{1.0.2-rc1}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2808) update kafka to version 0.8.1

2014-08-02 Thread Anand Avati (JIRA)

Anand Avati created SPARK-2808:
--

 Summary: update kafka to version 0.8.1
 Key: SPARK-2808
 URL: https://issues.apache.org/jira/browse/SPARK-2808
 Project: Spark
  Issue Type: Sub-task
Reporter: Anand Avati


First kafka_2.11 0.8.1 has to be released



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2807) update kafka to version 0.8.1

2014-08-02 Thread Anand Avati (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Avati updated SPARK-2807:
---

Summary: update kafka to version 0.8.1  (was: update kafka to version 2.3)

> update kafka to version 0.8.1
> -
>
> Key: SPARK-2807
> URL: https://issues.apache.org/jira/browse/SPARK-2807
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> First akka-zeromq_2.11 2.3.4 has to be released



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2807) update kafka to version 2.3

2014-08-02 Thread Anand Avati (JIRA)

Anand Avati created SPARK-2807:
--

 Summary: update kafka to version 2.3
 Key: SPARK-2807
 URL: https://issues.apache.org/jira/browse/SPARK-2807
 Project: Spark
  Issue Type: Sub-task
Reporter: Anand Avati


First akka-zeromq_2.11 2.3.4 has to be released



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2807) update akka-zeromq_2.11 to version 2.3.4

2014-08-02 Thread Anand Avati (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Avati updated SPARK-2807:
---

Summary: update akka-zeromq_2.11 to version 2.3.4  (was: update kafka to 
version 0.8.1)

> update akka-zeromq_2.11 to version 2.3.4
> 
>
> Key: SPARK-2807
> URL: https://issues.apache.org/jira/browse/SPARK-2807
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> First akka-zeromq_2.11 2.3.4 has to be released



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2797) SchemaRDDs don't support unpersist()

2014-08-02 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083788#comment-14083788
 ] 

Yin Huai commented on SPARK-2797:
-

Oh. I see the problem. In Scala RDD, we are using default parameter for the 
unpersist method. Seems we will not be able to call it without an input 
parameter.

> SchemaRDDs don't support unpersist()
> 
>
> Key: SPARK-2797
> URL: https://issues.apache.org/jira/browse/SPARK-2797
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.0.2
>Reporter: Nicholas Chammas
>
> Looks like something simple got missed in the Java layer?
> {code}
> >>> from pyspark.sql import SQLContext
> >>> sqlContext = SQLContext(sc)
> >>> raw = sc.parallelize(['{"a": 5}'])
> >>> events = sqlContext.jsonRDD(raw)
> >>> events.printSchema()
> root
>  |-- a: IntegerType
> >>> events.cache()
> PythonRDD[45] at RDD at PythonRDD.scala:37
> >>> events.unpersist()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/root/spark/python/pyspark/sql.py", line 440, in unpersist
> self._jschema_rdd.unpersist()
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 
> 537, in __call__
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 
> 304, in get_return_value
> py4j.protocol.Py4JError: An error occurred while calling o108.unpersist. 
> Trace:
> py4j.Py4JException: Method unpersist([]) does not exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>   at py4j.Gateway.invoke(Gateway.java:251)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> >>> events.unpersist
>  PythonRDD.scala:37>
> {code}
> Note that the {{unpersist}} method exists but cannot be called without 
> raising the shown error.
> This is on {{1.0.2-rc1}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2806) update json4s-jackson to version 3.2.10

2014-08-02 Thread Anand Avati (JIRA)

Anand Avati created SPARK-2806:
--

 Summary: update json4s-jackson to version 3.2.10
 Key: SPARK-2806
 URL: https://issues.apache.org/jira/browse/SPARK-2806
 Project: Spark
  Issue Type: Sub-task
Reporter: Anand Avati


json4s-jackson 3.2.6 is not available in Scala 2.11



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083784#comment-14083784
 ] 

Anand Avati commented on SPARK-1997:


Please re-open this issue as the patch was reverted. The same patch is good for 
re-apply now.

Note that scalalogging is already upgraded to 2.1.2 in master after branch-1.1, 
and therefore #940 needs to be re-applied to master soon.

> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2805) update akka to version 2.3

2014-08-02 Thread Anand Avati (JIRA)

Anand Avati created SPARK-2805:
--

 Summary: update akka to version 2.3
 Key: SPARK-2805
 URL: https://issues.apache.org/jira/browse/SPARK-2805
 Project: Spark
  Issue Type: Sub-task
Reporter: Anand Avati


akka-2.3 is the lowest version available in Scala 2.11

akka-2.3 depends on protobuf 2.5. Hadoop-1 requires protobuf 2.4.1. In order to 
reconcile the conflicting dependencies, need to release 
akka-2.3.x-shaded-protobuf artifact which has protobuf 2.5 within.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2655) Change the default logging level to WARN

2014-08-02 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083782#comment-14083782
 ] 

Sean Owen commented on SPARK-2655:
--

(Are you proposing PRs for any of these JIRAs?)
Disabling INFO seems a bit excessive though some particular classes are noisy 
and might do with a default of WARN.

> Change the default logging level to WARN
> 
>
> Key: SPARK-2655
> URL: https://issues.apache.org/jira/browse/SPARK-2655
> Project: Spark
>  Issue Type: Improvement
>Reporter: Davies Liu
>
> The current logging level INFO is pretty noisy, reduce these unnecessary 
> logging will provide better experience for users.
> Currently, Spark is march stable and nature than before, so user will not 
> need those much logging in normal cases. But some high level information will 
> be helpful, such as messages about job and tasks progress, we could changes 
> these important logging into WARN level as an hack, otherwise will need to 
> change all other logging into level DEBUG.
> PS: it's better to have one line progress logging in terminal (also in title).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2797) SchemaRDDs don't support unpersist()

2014-08-02 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083776#comment-14083776
 ] 

Yin Huai commented on SPARK-2797:
-

But, I do not understand why cache works but unpersist fails...

> SchemaRDDs don't support unpersist()
> 
>
> Key: SPARK-2797
> URL: https://issues.apache.org/jira/browse/SPARK-2797
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.0.2
>Reporter: Nicholas Chammas
>
> Looks like something simple got missed in the Java layer?
> {code}
> >>> from pyspark.sql import SQLContext
> >>> sqlContext = SQLContext(sc)
> >>> raw = sc.parallelize(['{"a": 5}'])
> >>> events = sqlContext.jsonRDD(raw)
> >>> events.printSchema()
> root
>  |-- a: IntegerType
> >>> events.cache()
> PythonRDD[45] at RDD at PythonRDD.scala:37
> >>> events.unpersist()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/root/spark/python/pyspark/sql.py", line 440, in unpersist
> self._jschema_rdd.unpersist()
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 
> 537, in __call__
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 
> 304, in get_return_value
> py4j.protocol.Py4JError: An error occurred while calling o108.unpersist. 
> Trace:
> py4j.Py4JException: Method unpersist([]) does not exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>   at py4j.Gateway.invoke(Gateway.java:251)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> >>> events.unpersist
>  PythonRDD.scala:37>
> {code}
> Note that the {{unpersist}} method exists but cannot be called without 
> raising the shown error.
> This is on {{1.0.2-rc1}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2797) SchemaRDDs don't support unpersist()

2014-08-02 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083775#comment-14083775
 ] 

Yin Huai commented on SPARK-2797:
-

I guess the problem is that when we create a Python SchemaRDD, we use a Scala 
SchemaRDD as the base SchemaRDD. When I look at the context.py, we are using a 
JavaRDD as the base RDD of a Python RDD. 

> SchemaRDDs don't support unpersist()
> 
>
> Key: SPARK-2797
> URL: https://issues.apache.org/jira/browse/SPARK-2797
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.0.2
>Reporter: Nicholas Chammas
>
> Looks like something simple got missed in the Java layer?
> {code}
> >>> from pyspark.sql import SQLContext
> >>> sqlContext = SQLContext(sc)
> >>> raw = sc.parallelize(['{"a": 5}'])
> >>> events = sqlContext.jsonRDD(raw)
> >>> events.printSchema()
> root
>  |-- a: IntegerType
> >>> events.cache()
> PythonRDD[45] at RDD at PythonRDD.scala:37
> >>> events.unpersist()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/root/spark/python/pyspark/sql.py", line 440, in unpersist
> self._jschema_rdd.unpersist()
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 
> 537, in __call__
>   File "/root/spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 
> 304, in get_return_value
> py4j.protocol.Py4JError: An error occurred while calling o108.unpersist. 
> Trace:
> py4j.Py4JException: Method unpersist([]) does not exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>   at py4j.Gateway.invoke(Gateway.java:251)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> >>> events.unpersist
>  PythonRDD.scala:37>
> {code}
> Note that the {{unpersist}} method exists but cannot be called without 
> raising the shown error.
> This is on {{1.0.2-rc1}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2739) Rename registerAsTable to registerTempTable

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083770#comment-14083770
 ] 

Apache Spark commented on SPARK-2739:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/1743

> Rename registerAsTable to registerTempTable
> ---
>
> Key: SPARK-2739
> URL: https://issues.apache.org/jira/browse/SPARK-2739
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2804) Remove scalalogging dependency in Spark SQL

2014-08-02 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2804:


Fix Version/s: 1.1.0

> Remove scalalogging dependency in Spark SQL
> ---
>
> Key: SPARK-2804
> URL: https://issues.apache.org/jira/browse/SPARK-2804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Blocker
> Fix For: 1.1.0
>
>
> For the 1.1. release we should just remove scala logging and rely on spark 
> logging for the SQL library.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2804) Remove scalalogging dependency in Spark SQL

2014-08-02 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2804.
-

Resolution: Fixed

Resolved here: 
https://github.com/apache/spark/commit/4c477117bb1ffef463776c86f925d35036f96b7a

> Remove scalalogging dependency in Spark SQL
> ---
>
> Key: SPARK-2804
> URL: https://issues.apache.org/jira/browse/SPARK-2804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Blocker
>
> For the 1.1. release we should just remove scala logging and rely on spark 
> logging for the SQL library.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2785) Avoid asserts for unimplemented hive features

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083757#comment-14083757
 ] 

Apache Spark commented on SPARK-2785:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/1742

> Avoid asserts for unimplemented hive features
> -
>
> Key: SPARK-2785
> URL: https://issues.apache.org/jira/browse/SPARK-2785
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Aaron Davidson
>Assignee: Michael Armbrust
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2783) Basic support for analyze in HiveContext

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083755#comment-14083755
 ] 

Apache Spark commented on SPARK-2783:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/1741

> Basic support for analyze in HiveContext
> 
>
> Key: SPARK-2783
> URL: https://issues.apache.org/jira/browse/SPARK-2783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Yin Huai
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1812) Support cross-building with Scala 2.11

2014-08-02 Thread Anand Avati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083749#comment-14083749
 ] 

Anand Avati commented on SPARK-1812:


[~pwendell], sure I will create sub-tasks. Was not aware of this workflow 
requirement, sorry.

> Support cross-building with Scala 2.11
> --
>
> Key: SPARK-1812
> URL: https://issues.apache.org/jira/browse/SPARK-1812
> Project: Spark
>  Issue Type: New Feature
>  Components: Build, Spark Core
>Reporter: Matei Zaharia
>Assignee: Prashant Sharma
>
> Since Scala 2.10/2.11 are source compatible, we should be able to cross build 
> for both versions. From what I understand there are basically three things we 
> need to figure out:
> 1. Have a two versions of our dependency graph, one that uses 2.11 
> dependencies and the other that uses 2.10 dependencies.
> 2. Figure out how to publish different poms for 2.10 and 2.11.
> I think (1) can be accomplished by having a scala 2.11 profile. (2) isn't 
> really well supported by Maven since published pom's aren't generated 
> dynamically. But we can probably script around it to make it work. I've done 
> some initial sanity checks with a simple build here:
> https://github.com/pwendell/scala-maven-crossbuild



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1981) Add AWS Kinesis streaming support

2014-08-02 Thread Nicholas Chammas (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083743#comment-14083743
 ] 

Nicholas Chammas commented on SPARK-1981:
-

Now that Chris's PR has been merged in, do we know if Kinesis support will be 
packaged with Spark by default starting in 1.1.0?

> Add AWS Kinesis streaming support
> -
>
> Key: SPARK-1981
> URL: https://issues.apache.org/jira/browse/SPARK-1981
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Chris Fregly
>Assignee: Chris Fregly
> Fix For: 1.1.0
>
>
> Add AWS Kinesis support to Spark Streaming.
> Initial discussion occured here:  https://github.com/apache/spark/pull/223
> I discussed this with Parviz from AWS recently and we agreed that I would 
> take this over.
> Look for a new PR that takes into account all the feedback from the earlier 
> PR including spark-1.0-compliant implementation, AWS-license-aware build 
> support, tests, comments, and style guide compliance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1449) Please delete old releases from mirroring system

2014-08-02 Thread Sebb (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083728#comment-14083728
 ] 

Sebb commented on SPARK-1449:
-

PING - is there anybody there?

> Please delete old releases from mirroring system
> 
>
> Key: SPARK-1449
> URL: https://issues.apache.org/jira/browse/SPARK-1449
> Project: Spark
>  Issue Type: Task
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.9.1
>Reporter: Sebb
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> Thanks!
> [Note that older releases are always available from the ASF archive server]
> Any links to older releases on download pages should first be adjusted to 
> point to the archive server.
> [1] http://www.apache.org/dev/release.html#when-to-archive



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2804) Remove scalalogging dependency in Spark SQL

2014-08-02 Thread Patrick Wendell (JIRA)

Patrick Wendell created SPARK-2804:
--

 Summary: Remove scalalogging dependency in Spark SQL
 Key: SPARK-2804
 URL: https://issues.apache.org/jira/browse/SPARK-2804
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Patrick Wendell
Assignee: Patrick Wendell


For the 1.1. release we should just remove scala logging and rely on spark 
logging for the SQL library.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1842) update scala-logging-slf4j to version 2.1.2

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1842.


Resolution: Won't Fix

Closed in favor of SPARK-2804

> update scala-logging-slf4j to version 2.1.2
> ---
>
> Key: SPARK-1842
> URL: https://issues.apache.org/jira/browse/SPARK-1842
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
>
>  scala-logging-slf4j 1.0.1  not support Scala 2.11



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1470) Use the scala-logging wrapper instead of the directly sfl4j api

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1470.


Resolution: Won't Fix

Closed in favor of SPARK-2804

> Use the scala-logging wrapper instead of the directly sfl4j api
> ---
>
> Key: SPARK-1470
> URL: https://issues.apache.org/jira/browse/SPARK-1470
> Project: Spark
>  Issue Type: Improvement
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> Now the Spark Catalyst using scalalogging-slf4j, but the Spark Core to use 
> slf4j-api
> We should use the scalalogging-slf4 wrapper instead of the directly sfl4j-api



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2804) Remove scalalogging dependency in Spark SQL

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2804:
---

Priority: Blocker  (was: Major)

> Remove scalalogging dependency in Spark SQL
> ---
>
> Key: SPARK-2804
> URL: https://issues.apache.org/jira/browse/SPARK-2804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Blocker
>
> For the 1.1. release we should just remove scala logging and rely on spark 
> logging for the SQL library.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2197) Spark invoke DecisionTree by Java

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083715#comment-14083715
 ] 

Apache Spark commented on SPARK-2197:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/1740

> Spark invoke DecisionTree by Java
> -
>
> Key: SPARK-2197
> URL: https://issues.apache.org/jira/browse/SPARK-2197
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: wulin
>Assignee: Joseph K. Bradley
>
> Strategy strategy = new Strategy(Algo.Classification(), new Impurity() {
>   @Override
>   public double calculate(double arg0, double arg1, 
> double arg2) {
>   return Gini.calculate(arg0, arg1, arg2);
>   }
>   @Override
>   public double calculate(double arg0, double arg1) {
>   return Gini.calculate(arg0, arg1);
>   }
>   }, 5, 100, QuantileStrategy.Sort(), null, 256);
>   DecisionTree decisionTree = new DecisionTree(strategy);
>   final DecisionTreeModel decisionTreeModel = 
> decisionTree.train(labeledPoints.rdd());
> i try to run it on spark, but find an error on the console:
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to 
> [Lorg.apache.spark.mllib.regression.LabeledPoint;
>   at 
> org.apache.spark.mllib.tree.DecisionTree$.findSplitsBins(DecisionTree.scala:990)
>   at org.apache.spark.mllib.tree.DecisionTree.train(DecisionTree.scala:56)
>   at 
> org.project.modules.spark.java.SparkDecisionTree.main(SparkDecisionTree.java:75)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> i view source code, find  
> val numFeatures = input.take(1)(0).features.size
> this is a problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1981) Add AWS Kinesis streaming support

2014-08-02 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-1981.
--

   Resolution: Fixed
Fix Version/s: 1.1.0

> Add AWS Kinesis streaming support
> -
>
> Key: SPARK-1981
> URL: https://issues.apache.org/jira/browse/SPARK-1981
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Chris Fregly
>Assignee: Chris Fregly
> Fix For: 1.1.0
>
>
> Add AWS Kinesis support to Spark Streaming.
> Initial discussion occured here:  https://github.com/apache/spark/pull/223
> I discussed this with Parviz from AWS recently and we agreed that I would 
> take this over.
> Look for a new PR that takes into account all the feedback from the earlier 
> PR including spark-1.0-compliant implementation, AWS-license-aware build 
> support, tests, comments, and style guide compliance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1853) Show Streaming application code context (file, line number) in Spark Stages UI

2014-08-02 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-1853:
-

Target Version/s: 1.2.0  (was: 1.1.0)

> Show Streaming application code context (file, line number) in Spark Stages UI
> --
>
> Key: SPARK-1853
> URL: https://issues.apache.org/jira/browse/SPARK-1853
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Tathagata Das
>Assignee: Mubarak Seyed
> Fix For: 1.1.0
>
> Attachments: Screen Shot 2014-07-03 at 2.54.05 PM.png
>
>
> Right now, the code context (file, and line number) shown for streaming jobs 
> in stages UI is meaningless as it refers to internal DStream: 
> rather than user application file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1812) Support cross-building with Scala 2.11

2014-08-02 Thread Patrick Wendell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083699#comment-14083699
 ] 

Patrick Wendell commented on SPARK-1812:


[~avati] can you create sub-tasks for the individual dependency upgrades? Right 
now you have a bunch of different PR's that are all sharing the same JIRA 
title... our development workflow assumes there is a 1<->1 relationship between 
JIRA's and patches.

> Support cross-building with Scala 2.11
> --
>
> Key: SPARK-1812
> URL: https://issues.apache.org/jira/browse/SPARK-1812
> Project: Spark
>  Issue Type: New Feature
>  Components: Build, Spark Core
>Reporter: Matei Zaharia
>Assignee: Prashant Sharma
>
> Since Scala 2.10/2.11 are source compatible, we should be able to cross build 
> for both versions. From what I understand there are basically three things we 
> need to figure out:
> 1. Have a two versions of our dependency graph, one that uses 2.11 
> dependencies and the other that uses 2.10 dependencies.
> 2. Figure out how to publish different poms for 2.10 and 2.11.
> I think (1) can be accomplished by having a scala 2.11 profile. (2) isn't 
> really well supported by Maven since published pom's aren't generated 
> dynamically. But we can probably script around it to make it work. I've done 
> some initial sanity checks with a simple build here:
> https://github.com/pwendell/scala-maven-crossbuild



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2478) Add Python APIs for decision tree

2014-08-02 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-2478.
--

   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 1727
[https://github.com/apache/spark/pull/1727]

> Add Python APIs for decision tree
> -
>
> Key: SPARK-2478
> URL: https://issues.apache.org/jira/browse/SPARK-2478
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, PySpark
>Reporter: Xiangrui Meng
>Assignee: Joseph K. Bradley
>Priority: Critical
> Fix For: 1.1.0
>
>
> In v1.0, we only support decision tree in Scala/Java. It would be nice to add 
> Python support. It may require some refactoring of the current decision tree 
> API to make it easier to construct a decision tree algorithm in Python.
> 1. Simplify decision tree constructors such that only simple types are used.
>   a. Hide the implementation of Impurity from users.
>   b. Replace enums by strings.
> 2. Make separate public decision tree classes for regression & classification 
> (with shared internals).  Eliminate algo parameter.
> 3. Implement wrappers in Python for DecisionTree.
> 4. Implement wrappers in Python for DecisionTreeModel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2799) Simplify some Scala operations for clarity, performance

2014-08-02 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-2799.
--

Resolution: Won't Fix

Too much risk of merge conflicts later -- not going to do this in a big-bang 
change.

> Simplify some Scala operations for clarity, performance
> ---
>
> Key: SPARK-2799
> URL: https://issues.apache.org/jira/browse/SPARK-2799
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples, GraphX, Spark Core, Streaming
>Affects Versions: 1.0.1
>Reporter: Sean Owen
>Priority: Minor
>
> For fun, here's a last minor suggestion for consideration before the 1.1 
> window closes.
> There are a number of instances where Scala operations can be simplified, for 
> clarity or even performance.
> For example getOrElse(null) can be orNull. filter(...).size can be faster as 
> count(...). An expression like "!...someLongChain...isEmpty" can be "... 
> .nonEmpty". And so on.
> These are from code inspections. The PR will show the changes broken down by 
> type in separate commits. They should be quite safe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2012) PySpark StatCounter with numpy arrays

2014-08-02 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-2012.
---

   Resolution: Fixed
Fix Version/s: 1.1.0
 Assignee: Jeremy Freeman

> PySpark StatCounter with numpy arrays
> -
>
> Key: SPARK-2012
> URL: https://issues.apache.org/jira/browse/SPARK-2012
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Jeremy Freeman
>Assignee: Jeremy Freeman
>Priority: Minor
> Fix For: 1.1.0
>
>
> In Spark 0.9, the PySpark version of StatCounter worked with an RDD of numpy 
> arrays just as with an RDD of scalars, which was very useful (e.g. for 
> computing stats on a set of vectors in ML analyses). In 1.0.0 this broke 
> because the added functionality for computing the minimum and maximum, as 
> implemented, doesn't work on arrays.
> I have a PR ready that re-enables this functionality by having StatCounter 
> use the numpy element-wise functions "maximum" and "minimum", which work on 
> both numpy arrays and scalars (and I've added new tests for this capability). 
> However, I realize this adds a dependency on NumPy outside of MLLib. If 
> that's not ok, maybe it'd be worth adding this functionality as a util within 
> PySpark MLLib?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2675) LiveListenerBus should set higher capacity for its event queue

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2675.


Resolution: Won't Fix

This is covered by the fix in SPARK-2316.

> LiveListenerBus should set higher capacity for its event queue 
> ---
>
> Key: SPARK-2675
> URL: https://issues.apache.org/jira/browse/SPARK-2675
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Zongheng Yang
>Assignee: Zongheng Yang
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2675) LiveListenerBus should set higher capacity for its event queue

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2675.


Resolution: Not a Problem

> LiveListenerBus should set higher capacity for its event queue 
> ---
>
> Key: SPARK-2675
> URL: https://issues.apache.org/jira/browse/SPARK-2675
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Zongheng Yang
>Assignee: Zongheng Yang
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-2675) LiveListenerBus should set higher capacity for its event queue

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-2675:



> LiveListenerBus should set higher capacity for its event queue 
> ---
>
> Key: SPARK-2675
> URL: https://issues.apache.org/jira/browse/SPARK-2675
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Zongheng Yang
>Assignee: Zongheng Yang
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2316) StorageStatusListener should avoid O(blocks) operations

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2316.


Resolution: Fixed

This was fixed via:
https://github.com/apache/spark/pull/1679

> StorageStatusListener should avoid O(blocks) operations
> ---
>
> Key: SPARK-2316
> URL: https://issues.apache.org/jira/browse/SPARK-2316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 1.0.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.1.0
>
>
> In the case where jobs are frequently causing dropped blocks the storage 
> status listener can bottleneck. This is slow for a few reasons, one being 
> that we use Scala collection operations, the other being that we operations 
> that are O(number of blocks). I think using a few indices here could make 
> this much faster.
> {code}
>  at java.lang.Integer.valueOf(Integer.java:642)
> at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70)
> at 
> org.apache.spark.storage.StorageUtils$$anonfun$9.apply(StorageUtils.scala:82)
> at 
> scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:328)
> at 
> scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327)
> at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
> at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
> at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
> at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
> at 
> scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327)
> at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105)
> at 
> org.apache.spark.storage.StorageUtils$.rddInfoFromStorageStatus(StorageUtils.scala:82)
> at 
> org.apache.spark.ui.storage.StorageListener.updateRDDInfo(StorageTab.scala:56)
> at 
> org.apache.spark.ui.storage.StorageListener.onTaskEnd(StorageTab.scala:67)
> - locked <0xa27ebe30> (a 
> org.apache.spark.ui.storage.StorageListener)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-2801) Generalize RandomRDD Generator output to generic type

2014-08-02 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng closed SPARK-2801.


   Resolution: Fixed
Fix Version/s: 1.1.0

> Generalize RandomRDD Generator output to generic type
> -
>
> Key: SPARK-2801
> URL: https://issues.apache.org/jira/browse/SPARK-2801
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Burak Yavuz
>Assignee: Burak Yavuz
>Priority: Trivial
> Fix For: 1.1.0
>
>
> The RandomRDDGenerators only output RDD[Double]. The DistributionGenerator 
> will be renamed to 
> RandomDataGenerator and the output will be a generic type for the nextValue() 
> function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2801) Generalize RandomRDD Generator output to generic type

2014-08-02 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-2801:
-

Assignee: Burak Yavuz

> Generalize RandomRDD Generator output to generic type
> -
>
> Key: SPARK-2801
> URL: https://issues.apache.org/jira/browse/SPARK-2801
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Burak Yavuz
>Assignee: Burak Yavuz
>Priority: Trivial
> Fix For: 1.1.0
>
>
> The RandomRDDGenerators only output RDD[Double]. The DistributionGenerator 
> will be renamed to 
> RandomDataGenerator and the output will be a generic type for the nextValue() 
> function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2729) Forgot to match Timestamp type in ColumnBuilder

2014-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083613#comment-14083613
 ] 

Apache Spark commented on SPARK-2729:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/1738

> Forgot to match Timestamp type in ColumnBuilder
> ---
>
> Key: SPARK-2729
> URL: https://issues.apache.org/jira/browse/SPARK-2729
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Teng Qiu
> Fix For: 1.1.0
>
>
> after SPARK-2710 we can create a table in Spark SQL with ColumnType Timestamp 
> from jdbc.
> when i try to
> {code}
> sqlContext.cacheTable("myJdbcTable")
> {code}
> then
> {code}
> sqlContext.sql("select count(*) from myJdbcTable")
> {code}
> i got exception:
> {code}
> scala.MatchError: 8 (of class java.lang.Integer)
> at 
> org.apache.spark.sql.columnar.ColumnBuilder$.apply(ColumnBuilder.scala:146)
> {code}
> i checked the code ColumnBuilder.scala:146
> it is just missing a match of Timestamp typeid.
> so it is easy to fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2803) add Kafka stream feature for fetch messages from specified starting offset position

2014-08-02 Thread pengyanhong (JIRA)

pengyanhong created SPARK-2803:
--

 Summary: add Kafka stream feature for fetch messages from 
specified starting offset position
 Key: SPARK-2803
 URL: https://issues.apache.org/jira/browse/SPARK-2803
 Project: Spark
  Issue Type: New Feature
  Components: Input/Output
Reporter: pengyanhong


There are some use cases that we want to fetch message from specified offset 
position, as below:
* replay messages
* deal with transaction
* skip bulk incorrect messages
* random fetch message according to index



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2802) Improve the Cassandra sample and Add a new sample for Streaming to Cassandra

2014-08-02 Thread Helena Edelson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083554#comment-14083554
 ] 

Helena Edelson commented on SPARK-2802:
---

I'll do a PR for this.

> Improve the Cassandra sample and Add a new sample for Streaming to Cassandra
> 
>
> Key: SPARK-2802
> URL: https://issues.apache.org/jira/browse/SPARK-2802
> Project: Spark
>  Issue Type: Improvement
>Reporter: Helena Edelson
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2802) Improve the Cassandra sample and Add a new sample for Streaming to Cassandra

2014-08-02 Thread Helena Edelson (JIRA)

Helena Edelson created SPARK-2802:
-

 Summary: Improve the Cassandra sample and Add a new sample for 
Streaming to Cassandra
 Key: SPARK-2802
 URL: https://issues.apache.org/jira/browse/SPARK-2802
 Project: Spark
  Issue Type: Improvement
Reporter: Helena Edelson
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2454) Separate driver spark home from executor spark home

2014-08-02 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083500#comment-14083500
 ] 

Sean Owen commented on SPARK-2454:
--

(Sorry if this ends up a double-post). This change is making Jenkins fail tests 
for ExecutorRunnerTest and SparkSubmitSuite. spark.test.home does not seem to 
be set.

https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/lastFailedBuild/HADOOP_PROFILE=hadoop-2.3,label=centos/consoleFull

> Separate driver spark home from executor spark home
> ---
>
> Key: SPARK-2454
> URL: https://issues.apache.org/jira/browse/SPARK-2454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.1
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.1.0
>
>
> The driver may not always share the same directory structure as the 
> executors. It makes little sense to always re-use the driver's spark home on 
> the executors.
> https://github.com/apache/spark/pull/1244/ is an open effort to fix this. 
> However, this still requires us to set SPARK_HOME on all the executor nodes. 
> Really we should separate this out into something like `spark.executor.home` 
> and `spark.driver.home` rather than re-using SPARK_HOME everywhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2773) Shuffle：use growth rate to predict if need to spill

2014-08-02 Thread uncleGen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

uncleGen updated SPARK-2773:


Description: Right now, Spark uses the total usage of "shuffle" memory of 
each thread to predict if need to spill. I think it is not very reasonable. For 
example, there are two threads pulling "shuffle" data. The total memory used to 
buffer data is 21G. The first time to trigger spilling it when one thread has 
used 7G memory to buffer "shuffle" data, here I assume another one has used the 
same size. Unfortunately, I still have remaining 7G to use. So, I think current 
prediction mode is too conservative, and can not maximize the usage of 
"shuffle" memory. In my solution, I use the growth rate of "shuffle" memory. 
Again, the growth of each time is limited, maybe 10K * 1024(my assumption), 
then the first time to trigger spilling is when the remaining "shuffle" memory 
is less than threads * growth * 2, i.e. 2 * 10M * 2. I think it can maximize 
the usage of "shuffle" memory. In my solution, there is also a conservative 
assumption, i.e. all of threads is pulling shuffle data in one executor. 
However it dose not have much effect, the grow is limited after all. Any 
suggestion?  (was: Right now, Spark uses the total usage of "shuffle" memory of 
each thread to predict if need to spill. I think it is not very reasonable. For 
example, there are two threads pulling "shuffle" data. The total memory used to 
buffer data is 21G. The first time to trigger spilling it when one thread has 
used 7G memory to buffer "shuffle" data, here I assume another one has used the 
same size. Unfortunately, I still have remaining 7G to use. So, I think current 
prediction mode is too conservation, and can not maximize the usage of 
"shuffle" memory. In my solution, I use the growth rate of "shuffle" memory. 
Again, the growth of each time is limited, maybe 10K * 1024(my assumption), 
then the first time to trigger spilling is when the remaining "shuffle" memory 
is less than threads * growth * 2, i.e. 2 * 10M * 2. I think it can maximize 
the usage of "shuffle" memory. Any suggestion?)

> Shuffle：use growth rate to predict if need to spill
> ---
>
> Key: SPARK-2773
> URL: https://issues.apache.org/jira/browse/SPARK-2773
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 0.9.0, 1.0.0
>Reporter: uncleGen
>Priority: Minor
>
> Right now, Spark uses the total usage of "shuffle" memory of each thread to 
> predict if need to spill. I think it is not very reasonable. For example, 
> there are two threads pulling "shuffle" data. The total memory used to buffer 
> data is 21G. The first time to trigger spilling it when one thread has used 
> 7G memory to buffer "shuffle" data, here I assume another one has used the 
> same size. Unfortunately, I still have remaining 7G to use. So, I think 
> current prediction mode is too conservative, and can not maximize the usage 
> of "shuffle" memory. In my solution, I use the growth rate of "shuffle" 
> memory. Again, the growth of each time is limited, maybe 10K * 1024(my 
> assumption), then the first time to trigger spilling is when the remaining 
> "shuffle" memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think 
> it can maximize the usage of "shuffle" memory. In my solution, there is also 
> a conservative assumption, i.e. all of threads is pulling shuffle data in one 
> executor. However it dose not have much effect, the grow is limited after 
> all. Any suggestion?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2773) Shuffle：use growth rate to predict if need to spill

2014-08-02 Thread uncleGen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

uncleGen updated SPARK-2773:


Description: Right now, Spark uses the total usage of "shuffle" memory of 
each thread to predict if need to spill. I think it is not very reasonable. For 
example, there are two threads pulling "shuffle" data. The total memory used to 
buffer data is 21G. The first time to trigger spilling it when one thread has 
used 7G memory to buffer "shuffle" data, here I assume another one has used the 
same size. Unfortunately, I still have remaining 7G to use. So, I think current 
prediction mode is too conservation, and can not maximize the usage of 
"shuffle" memory. In my solution, I use the growth rate of "shuffle" memory. 
Again, the growth of each time is limited, maybe 10K * 1024(my assumption), 
then the first time to trigger spilling is when the remaining "shuffle" memory 
is less than threads * growth * 2, i.e. 2 * 10M * 2. I think it can maximize 
the usage of "shuffle" memory. Any suggestion?  (was: Right now, Spark uses the 
total usage of "shuffle" memory of each thread to predict if need to spill. I 
think it is not very reasonable. For example, there are two threads pulling 
"shuffle" data. The total memory used to buffer data is 21G. The first time to 
trigger spilling it when one thread has used 7G memory to buffer "shuffle" 
data, here I assume another one has used the same size. Unfortunately, I still 
have remaining 7G to use. So, I think current prediction mode is too 
conservation, and can not maximize the usage of "shuffle" memory. In my 
solution, I use the growth rate of "shuffle" memory. Again, the growth of each 
time is limited, maybe 10K * 1024(my assumption), then the first time to 
trigger spilling is when the remaining "shuffle" memory is less than threads * 
growth * 2, i.e. 2 * 2 * 10M. I think it can maximize the usage of "shuffle" 
memory. Any suggestion?)

> Shuffle：use growth rate to predict if need to spill
> ---
>
> Key: SPARK-2773
> URL: https://issues.apache.org/jira/browse/SPARK-2773
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 0.9.0, 1.0.0
>Reporter: uncleGen
>Priority: Minor
>
> Right now, Spark uses the total usage of "shuffle" memory of each thread to 
> predict if need to spill. I think it is not very reasonable. For example, 
> there are two threads pulling "shuffle" data. The total memory used to buffer 
> data is 21G. The first time to trigger spilling it when one thread has used 
> 7G memory to buffer "shuffle" data, here I assume another one has used the 
> same size. Unfortunately, I still have remaining 7G to use. So, I think 
> current prediction mode is too conservation, and can not maximize the usage 
> of "shuffle" memory. In my solution, I use the growth rate of "shuffle" 
> memory. Again, the growth of each time is limited, maybe 10K * 1024(my 
> assumption), then the first time to trigger spilling is when the remaining 
> "shuffle" memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think 
> it can maximize the usage of "shuffle" memory. Any suggestion?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2579) Reading from S3 returns an inconsistent number of items with Spark 0.9.1

2014-08-02 Thread Eemil Lagerspetz (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083484#comment-14083484
 ] 

Eemil Lagerspetz commented on SPARK-2579:
-

Hi,
I have tried with 1.0.1. Result is the same, not the right amount of partitions:
Row item counts: 12

This happens only with a recently saved set of data. For an older set, it works 
fine.
If the data set is less than 8 hours old (maybe it also happens with a little 
older set), then the read result is not consistent.
This does not happen when using distcp.


> Reading from S3 returns an inconsistent number of items with Spark 0.9.1
> 
>
> Key: SPARK-2579
> URL: https://issues.apache.org/jira/browse/SPARK-2579
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 0.9.1
>Reporter: Eemil Lagerspetz
>Priority: Critical
>  Labels: hdfs, read, s3, skipping
>
> I have created a random matrix of 1M rows with 10K items on each row, 
> semicolon-separated. While reading it with Spark 0.9.1 and doing a count, I 
> consistently get less than 1M rows, and a different number every time at that 
> ( !! ). Example below:
> head -n 1 tool-generate-random-matrix*log
> ==> tool-generate-random-matrix-999158.log <==
> Row item counts: 999158
> ==> tool-generate-random-matrix.log <==
> Row item counts: 997163
> The data is split into 1000 partitions. When I download it using s3cmd sync, 
> and run the following AWK on it, I get the correct number of rows in each 
> partition (1000x1000 = 1M). What is up?
> {code:title=checkrows.sh|borderStyle=solid}
> for k in part-0*
> do
>   echo $k
>   awk -F ";" '
> NF != 1 {
>   print "Wrong number of items:",NF
> }
> END {
>   if (NR != 1000) {
> print "Wrong number of rows:",NR
>   }
> }' "$k"
> done
> {code}
> The matrix generation and counting code is below:
> {code:title=Matrix.scala|borderStyle=solid}
> package fi.helsinki.cs.nodes.matrix
> import java.util.Random
> import org.apache.spark._
> import org.apache.spark.SparkContext._
> import scala.collection.mutable.ListBuffer
> import org.apache.spark.rdd.RDD
> import org.apache.spark.storage.StorageLevel._
> object GenerateRandomMatrix {
>   def NewGeMatrix(rSeed: Int, rdd: RDD[Int], features: Int) = {
> rdd.mapPartitions(part => part.map(xarr => {
> val rdm = new Random(rSeed + xarr)
> val arr = new Array[Double](features)
> for (i <- 0 until features)
>   arr(i) = rdm.nextDouble()
> new Row(xarr, arr)
>   }))
>   }
>   case class Row(id: Int, elements: Array[Double]) {}
>   def rowFromText(line: String) = {
> val idarr = line.split(" ")
> val arr = idarr(1).split(";")
> // -1 to fix saved matrix indexing error
> new Row(idarr(0).toInt-1, arr.map(_.toDouble))
>   }
>   def main(args: Array[String]) {
> val master = args(0)
> val tasks = args(1).toInt
> val savePath = args(2)
> val read = args.contains("read")
> 
> val datapoints = 100
> val features = 1
> val sc = new SparkContext(master, "RandomMatrix")
> if (read) {
>   val randomMatrix: RDD[Row] = sc.textFile(savePath, 
> tasks).map(rowFromText).persist(MEMORY_AND_DISK)
>   println("Row item counts: "+ randomMatrix.count)
> } else {
>   val rdd = sc.parallelize(0 until datapoints, tasks)
>   val bcSeed = sc.broadcast(128)
>   /* Generating a matrix of random Doubles */
>   val randomMatrix = NewGeMatrix(bcSeed.value, rdd, 
> features).persist(MEMORY_AND_DISK)
>   randomMatrix.map(row => row.id + " " + 
> row.elements.mkString(";")).saveAsTextFile(savePath)
> }
> 
> sc.stop
>   }
> }
> {code}
> I run this with:
> appassembler/bin/tool-generate-random-matrix master 1000 
> s3n://keys@path/to/data 1>matrix.log 2>matrix.err
> Reading from HDFS gives the right count and right number of items on each 
> row. However, I had to run with the full path with the server name, just 
> /matrix does not work (it thinks I want file://):
> p="hdfs://ec2-54-188-6-77.us-west-2.compute.amazonaws.com:9000/matrix"
> appassembler/bin/tool-generate-random-matrix $( cat 
> /root/spark-ec2/cluster-url ) 1000 "$p" read 1>readmatrix.log 2>readmatrix.err



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2773) Shuffle：use growth rate to predict if need to spill

2014-08-02 Thread uncleGen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

uncleGen updated SPARK-2773:


Description: Right now, Spark uses the total usage of "shuffle" memory of 
each thread to predict if need to spill. I think it is not very reasonable. For 
example, there are two threads pulling "shuffle" data. The total memory used to 
buffer data is 21G. The first time to trigger spilling it when one thread has 
used 7G memory to buffer "shuffle" data, here I assume another one has used the 
same size. Unfortunately, I still have remaining 7G to use. So, I think current 
prediction mode is too conservation, and can not maximize the usage of 
"shuffle" memory. In my solution, I use the growth rate of "shuffle" memory. 
Again, the growth of each time is limited, maybe 10K * 1024(my assumption), 
then the first time to trigger spilling is when the remaining "shuffle" memory 
is less than threads * growth * 2, i.e. 2 * 2 * 10M. I think it can maximize 
the usage of "shuffle" memory. Any suggestion?

> Shuffle：use growth rate to predict if need to spill
> ---
>
> Key: SPARK-2773
> URL: https://issues.apache.org/jira/browse/SPARK-2773
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 0.9.0, 1.0.0
>Reporter: uncleGen
>Priority: Minor
>
> Right now, Spark uses the total usage of "shuffle" memory of each thread to 
> predict if need to spill. I think it is not very reasonable. For example, 
> there are two threads pulling "shuffle" data. The total memory used to buffer 
> data is 21G. The first time to trigger spilling it when one thread has used 
> 7G memory to buffer "shuffle" data, here I assume another one has used the 
> same size. Unfortunately, I still have remaining 7G to use. So, I think 
> current prediction mode is too conservation, and can not maximize the usage 
> of "shuffle" memory. In my solution, I use the growth rate of "shuffle" 
> memory. Again, the growth of each time is limited, maybe 10K * 1024(my 
> assumption), then the first time to trigger spilling is when the remaining 
> "shuffle" memory is less than threads * growth * 2, i.e. 2 * 2 * 10M. I think 
> it can maximize the usage of "shuffle" memory. Any suggestion?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2489) Unsupported parquet datatype optional fixed_len_byte_array

2014-08-02 Thread Joseph Su (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083483#comment-14083483
 ] 

Joseph Su commented on SPARK-2489:
--

https://github.com/apache/spark/pull/1737

> Unsupported parquet datatype optional fixed_len_byte_array
> --
>
> Key: SPARK-2489
> URL: https://issues.apache.org/jira/browse/SPARK-2489
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Pei-Lun Lee
>
> tested against commit 9fe693b5
> {noformat}
> scala> sqlContext.parquetFile("/tmp/foo")
> java.lang.RuntimeException: Unsupported parquet datatype optional 
> fixed_len_byte_array(4) b
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:58)
>   at 
> org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:109)
>   at 
> org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:282)
>   at 
> org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:279)
> {noformat}
> example avro schema
> {noformat}
> protocol Test {
> fixed Bytes4(4);
> record Foo {
> union {null, Bytes4} b;
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2454) Separate driver spark home from executor spark home

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2454:
---

Assignee: Andrew Or

> Separate driver spark home from executor spark home
> ---
>
> Key: SPARK-2454
> URL: https://issues.apache.org/jira/browse/SPARK-2454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.1
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.1.0
>
>
> The driver may not always share the same directory structure as the 
> executors. It makes little sense to always re-use the driver's spark home on 
> the executors.
> https://github.com/apache/spark/pull/1244/ is an open effort to fix this. 
> However, this still requires us to set SPARK_HOME on all the executor nodes. 
> Really we should separate this out into something like `spark.executor.home` 
> and `spark.driver.home` rather than re-using SPARK_HOME everywhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2290) Do not send SPARK_HOME from driver to executors

2014-08-02 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2290:
---

Summary: Do not send SPARK_HOME from driver to executors  (was: Do not send 
SPARK_HOME from workers to executors)

> Do not send SPARK_HOME from driver to executors
> ---
>
> Key: SPARK-2290
> URL: https://issues.apache.org/jira/browse/SPARK-2290
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: YanTang Zhai
>Assignee: Patrick Wendell
>
> The client path is /data/home/spark/test/spark-1.0.0 while the worker deploy 
> path is /data/home/spark/spark-1.0.0 which is different from the client path. 
> Then an application is launched using the ./bin/spark-submit --class 
> JobTaskJoin --master spark://172.25.38.244:7077 --executor-memory 128M 
> ../jobtaskjoin_2.10-1.0.0.jar. However the application is failed because an 
> exception occurs at 
> java.io.IOException: Cannot run program 
> "/data/home/spark/test/spark-1.0.0-bin-0.20.2-cdh3u3/bin/compute-classpath.sh"
>  (in directory "."): error=2, No such file or directory
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:759)
> at 
> org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:72)
> at 
> org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
> at 
> org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:109)
> at 
> org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:124)
> at 
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:58)
> Caused by: java.io.IOException: error=2, No such file or directory
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.(UNIXProcess.java:135)
> at java.lang.ProcessImpl.start(ProcessImpl.java:130)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
> ... 6 more
> Therefore, I think worker should not use appDesc.sparkHome when 
> LaunchExecutor, Contrarily, worker could use its own sparkHome directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

95 matches

Mail list logo