[jira] [Commented] (SPARK-9228) Combine unsafe and codegen into a single option

2015-08-20 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706315#comment-14706315
 ] 

Yi Zhou commented on SPARK-9228:


Thanks [~davies] !

> Combine unsafe and codegen into a single option
> ---
>
> Key: SPARK-9228
> URL: https://issues.apache.org/jira/browse/SPARK-9228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Before QA, lets flip on features and consolidate unsafe and codegen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10122) AttributeError: 'RDD' object has no attribute 'offsetRanges'

2015-08-20 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706306#comment-14706306
 ] 

Saisai Shao commented on SPARK-10122:
-

Thanks :).

> AttributeError: 'RDD' object has no attribute 'offsetRanges'
> 
>
> Key: SPARK-10122
> URL: https://issues.apache.org/jira/browse/SPARK-10122
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Streaming
>Reporter: Amit Ramesh
>  Labels: kafka
>
> SPARK-8389 added the offsetRanges interface to Kafka direct streams. This 
> however appears to break when chaining operations after a transform 
> operation. Following is example code that would result in an error (stack 
> trace below). Note that if the 'count()' operation is taken out of the 
> example code then this error does not occur anymore, and the Kafka data is 
> printed.
> {code:title=kafka_test.py|collapse=true}
> from pyspark import SparkContext
> from pyspark.streaming import StreamingContext
> from pyspark.streaming.kafka import KafkaUtils
> def attach_kafka_metadata(kafka_rdd):
> offset_ranges = kafka_rdd.offsetRanges()
> return kafka_rdd
> if __name__ == "__main__":
> sc = SparkContext(appName='kafka-test')
> ssc = StreamingContext(sc, 10)
> kafka_stream = KafkaUtils.createDirectStream(
> ssc,
> [TOPIC],
> kafkaParams={
> 'metadata.broker.list': BROKERS,
> },
> )
> kafka_stream.transform(attach_kafka_metadata).count().pprint()
> ssc.start()
> ssc.awaitTermination()
> {code}
> {code:title=Stack trace|collapse=true}
> Traceback (most recent call last):
>   File "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/util.py", 
> line 62, in call
> r = self.func(t, *rdds)
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 332, in 
> func = lambda t, rdd: oldfunc(rdd)
>   File "/home/spark/ad_realtime/batch/kafka_test.py", line 7, in 
> attach_kafka_metadata
> offset_ranges = kafka_rdd.offsetRanges()
> AttributeError: 'RDD' object has no attribute 'offsetRanges'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9228) Combine unsafe and codegen into a single option

2015-08-20 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706289#comment-14706289
 ] 

Davies Liu commented on SPARK-9228:
---

Right now, it's an internal configuration (could be changed or removed in next 
release), we keep them only for debug purpose.

> Combine unsafe and codegen into a single option
> ---
>
> Key: SPARK-9228
> URL: https://issues.apache.org/jira/browse/SPARK-9228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Before QA, lets flip on features and consolidate unsafe and codegen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10148) Display active and inactive receiver numbers in Streaming page

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10148:


Assignee: Apache Spark

> Display active and inactive receiver numbers in Streaming page
> --
>
> Key: SPARK-10148
> URL: https://issues.apache.org/jira/browse/SPARK-10148
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>
> Displaying active and inactive receiver numbers in Streaming page is helpful 
> to  understand whether receivers have started or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10148) Display active and inactive receiver numbers in Streaming page

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10148:


Assignee: (was: Apache Spark)

> Display active and inactive receiver numbers in Streaming page
> --
>
> Key: SPARK-10148
> URL: https://issues.apache.org/jira/browse/SPARK-10148
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Shixiong Zhu
>
> Displaying active and inactive receiver numbers in Streaming page is helpful 
> to  understand whether receivers have started or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10148) Display active and inactive receiver numbers in Streaming page

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706288#comment-14706288
 ] 

Apache Spark commented on SPARK-10148:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/8351

> Display active and inactive receiver numbers in Streaming page
> --
>
> Key: SPARK-10148
> URL: https://issues.apache.org/jira/browse/SPARK-10148
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Shixiong Zhu
>
> Displaying active and inactive receiver numbers in Streaming page is helpful 
> to  understand whether receivers have started or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10122) AttributeError: 'RDD' object has no attribute 'offsetRanges'

2015-08-20 Thread Amit Ramesh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706286#comment-14706286
 ] 

Amit Ramesh commented on SPARK-10122:
-

[~jerryshao] thanks for jumping onto this right away! I tried your patch with 
the example I have provided in this ticket, and also with the original, more 
involved, code that we first witnessed this issue in and they both seem to be 
working fine :).

> AttributeError: 'RDD' object has no attribute 'offsetRanges'
> 
>
> Key: SPARK-10122
> URL: https://issues.apache.org/jira/browse/SPARK-10122
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Streaming
>Reporter: Amit Ramesh
>  Labels: kafka
>
> SPARK-8389 added the offsetRanges interface to Kafka direct streams. This 
> however appears to break when chaining operations after a transform 
> operation. Following is example code that would result in an error (stack 
> trace below). Note that if the 'count()' operation is taken out of the 
> example code then this error does not occur anymore, and the Kafka data is 
> printed.
> {code:title=kafka_test.py|collapse=true}
> from pyspark import SparkContext
> from pyspark.streaming import StreamingContext
> from pyspark.streaming.kafka import KafkaUtils
> def attach_kafka_metadata(kafka_rdd):
> offset_ranges = kafka_rdd.offsetRanges()
> return kafka_rdd
> if __name__ == "__main__":
> sc = SparkContext(appName='kafka-test')
> ssc = StreamingContext(sc, 10)
> kafka_stream = KafkaUtils.createDirectStream(
> ssc,
> [TOPIC],
> kafkaParams={
> 'metadata.broker.list': BROKERS,
> },
> )
> kafka_stream.transform(attach_kafka_metadata).count().pprint()
> ssc.start()
> ssc.awaitTermination()
> {code}
> {code:title=Stack trace|collapse=true}
> Traceback (most recent call last):
>   File "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/util.py", 
> line 62, in call
> r = self.func(t, *rdds)
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 332, in 
> func = lambda t, rdd: oldfunc(rdd)
>   File "/home/spark/ad_realtime/batch/kafka_test.py", line 7, in 
> attach_kafka_metadata
> offset_ranges = kafka_rdd.offsetRanges()
> AttributeError: 'RDD' object has no attribute 'offsetRanges'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10148) Display active and inactive receiver numbers in Streaming page

2015-08-20 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-10148:


 Summary: Display active and inactive receiver numbers in Streaming 
page
 Key: SPARK-10148
 URL: https://issues.apache.org/jira/browse/SPARK-10148
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Shixiong Zhu


Displaying active and inactive receiver numbers in Streaming page is helpful to 
 understand whether receivers have started or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-10147) App shouldn't show in HistoryServer web when the event file has been deleted on hdfs

2015-08-20 Thread meiyoula (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

meiyoula closed SPARK-10147.

Resolution: Not A Problem

> App shouldn't show in HistoryServer web when the event file has been deleted 
> on hdfs
> 
>
> Key: SPARK-10147
> URL: https://issues.apache.org/jira/browse/SPARK-10147
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: meiyoula
>
> Phenomenon:App still shows in HistoryServer web when the event file has been 
> deleted on hdfs.
> Cause: It is because *log-replay-executor* thread and *clean log* thread both 
> will write value to object *application*, so it has synchronization problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9669) Support PySpark with Mesos Cluster mode

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9669:
---

Assignee: Apache Spark

> Support PySpark with Mesos Cluster mode
> ---
>
> Key: SPARK-9669
> URL: https://issues.apache.org/jira/browse/SPARK-9669
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, PySpark
>Reporter: Timothy Chen
>Assignee: Apache Spark
>
> PySpark with cluster mode with Mesos is not yet supported.
> We need to enable it and make sure it's able to launch Pyspark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9669) Support PySpark with Mesos Cluster mode

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9669:
---

Assignee: (was: Apache Spark)

> Support PySpark with Mesos Cluster mode
> ---
>
> Key: SPARK-9669
> URL: https://issues.apache.org/jira/browse/SPARK-9669
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, PySpark
>Reporter: Timothy Chen
>
> PySpark with cluster mode with Mesos is not yet supported.
> We need to enable it and make sure it's able to launch Pyspark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9669) Support PySpark with Mesos Cluster mode

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706262#comment-14706262
 ] 

Apache Spark commented on SPARK-9669:
-

User 'tnachen' has created a pull request for this issue:
https://github.com/apache/spark/pull/8349

> Support PySpark with Mesos Cluster mode
> ---
>
> Key: SPARK-9669
> URL: https://issues.apache.org/jira/browse/SPARK-9669
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, PySpark
>Reporter: Timothy Chen
>
> PySpark with cluster mode with Mesos is not yet supported.
> We need to enable it and make sure it's able to launch Pyspark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8467) Add LDAModel.describeTopics() in Python

2015-08-20 Thread Hrishikesh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706257#comment-14706257
 ] 

Hrishikesh commented on SPARK-8467:
---

[~yuu.ishik...@gmail.com], are you still working on this?

> Add LDAModel.describeTopics() in Python
> ---
>
> Key: SPARK-8467
> URL: https://issues.apache.org/jira/browse/SPARK-8467
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, PySpark
>Reporter: Yu Ishikawa
>
> Add LDAModel. describeTopics() in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9999) RDD-like API on top of Catalyst/DataFrame

2015-08-20 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706244#comment-14706244
 ] 

Reynold Xin commented on SPARK-:


This needs to be designed first. I'm not sure if static code analysis is a 
great idea since they fail often. I'm open to ideas though.


> RDD-like API on top of Catalyst/DataFrame
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Story
>  Components: SQL
>Reporter: Reynold Xin
>
> The RDD API is very flexible, and as a result harder to optimize its 
> execution in some cases. The DataFrame API, on the other hand, is much easier 
> to optimize, but lacks some of the nice perks of the RDD API (e.g. harder to 
> use UDFs, lack of strong types in Scala/Java).
> As a Spark user, I want an API that sits somewhere in the middle of the 
> spectrum so I can write most of my applications with that API, and yet it can 
> be optimized well by Spark to achieve performance and stability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9983) Local physical operators for query execution

2015-08-20 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9983:
---
Description: 
In distributed query execution, there are two kinds of operators:

(1) operators that exchange data between different executors or threads: 
examples include broadcast, shuffle.

(2) operators that process data in a single thread: examples include project, 
filter, group by, etc.

This ticket proposes clearly differentiating them and create local operators in 
Spark. This leads to a lot of benefits: easier to test, easier to optimize data 
exchange, better design (single responsibility), and potentially even having a 
hyper-optimized single-node version of DataFrame.


  was:
In distributed query execution, there are two kinds of operators:

(1) operators that exchange data between different executors or threads: 
examples include broadcast, shuffle.

(2) operators that process data in a single thread: examples include project, 
filter, group by, etc.

This ticket proposes clearly differentiating them and create local operators in 
Spark. This leads to a lot of benefits: easier to test, easier to optimize data 
exchange, and better design (single responsibility).




> Local physical operators for query execution
> 
>
> Key: SPARK-9983
> URL: https://issues.apache.org/jira/browse/SPARK-9983
> Project: Spark
>  Issue Type: Story
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Shixiong Zhu
>
> In distributed query execution, there are two kinds of operators:
> (1) operators that exchange data between different executors or threads: 
> examples include broadcast, shuffle.
> (2) operators that process data in a single thread: examples include project, 
> filter, group by, etc.
> This ticket proposes clearly differentiating them and create local operators 
> in Spark. This leads to a lot of benefits: easier to test, easier to optimize 
> data exchange, better design (single responsibility), and potentially even 
> having a hyper-optimized single-node version of DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10147) App shouldn't show in HistoryServer web when the event file has been deleted on hdfs

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10147:


Assignee: Apache Spark

> App shouldn't show in HistoryServer web when the event file has been deleted 
> on hdfs
> 
>
> Key: SPARK-10147
> URL: https://issues.apache.org/jira/browse/SPARK-10147
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: meiyoula
>Assignee: Apache Spark
>
> Phenomenon:App still shows in HistoryServer web when the event file has been 
> deleted on hdfs.
> Cause: It is because *log-replay-executor* thread and *clean log* thread both 
> will write value to object *application*, so it has synchronization problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10147) App shouldn't show in HistoryServer web when the event file has been deleted on hdfs

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10147:


Assignee: (was: Apache Spark)

> App shouldn't show in HistoryServer web when the event file has been deleted 
> on hdfs
> 
>
> Key: SPARK-10147
> URL: https://issues.apache.org/jira/browse/SPARK-10147
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: meiyoula
>
> Phenomenon:App still shows in HistoryServer web when the event file has been 
> deleted on hdfs.
> Cause: It is because *log-replay-executor* thread and *clean log* thread both 
> will write value to object *application*, so it has synchronization problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10147) App shouldn't show in HistoryServer web when the event file has been deleted on hdfs

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706217#comment-14706217
 ] 

Apache Spark commented on SPARK-10147:
--

User 'XuTingjun' has created a pull request for this issue:
https://github.com/apache/spark/pull/8348

> App shouldn't show in HistoryServer web when the event file has been deleted 
> on hdfs
> 
>
> Key: SPARK-10147
> URL: https://issues.apache.org/jira/browse/SPARK-10147
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: meiyoula
>
> Phenomenon:App still shows in HistoryServer web when the event file has been 
> deleted on hdfs.
> Cause: It is because *log-replay-executor* thread and *clean log* thread both 
> will write value to object *application*, so it has synchronization problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10147) App shouldn't show in HistoryServer web when the event file has been deleted on hdfs

2015-08-20 Thread meiyoula (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

meiyoula updated SPARK-10147:
-
Description: 
Phenomenon:App still shows in HistoryServer web when the event file has been 
deleted on hdfs.
Cause: It is because *log-replay-executor* thread and *clean log* thread both 
will write value to object *application*, so it has synchronization problem

  was:

It is because *log-replay-executor* thread and *clean log* thread both will 
write value to object *application*, so it has synchronization problem


> App shouldn't show in HistoryServer web when the event file has been deleted 
> on hdfs
> 
>
> Key: SPARK-10147
> URL: https://issues.apache.org/jira/browse/SPARK-10147
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: meiyoula
>
> Phenomenon:App still shows in HistoryServer web when the event file has been 
> deleted on hdfs.
> Cause: It is because *log-replay-executor* thread and *clean log* thread both 
> will write value to object *application*, so it has synchronization problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10147) App shouldn't show in HistoryServer web when the event file has been deleted on hdfs

2015-08-20 Thread meiyoula (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

meiyoula updated SPARK-10147:
-
Summary: App shouldn't show in HistoryServer web when the event file has 
been deleted on hdfs  (was: App still shows in HistoryServer web when the event 
file has been deleted on hdfs)

> App shouldn't show in HistoryServer web when the event file has been deleted 
> on hdfs
> 
>
> Key: SPARK-10147
> URL: https://issues.apache.org/jira/browse/SPARK-10147
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: meiyoula
>
> It is because *log-replay-executor* thread and *clean log* thread both will 
> write value to object *application*, so it has synchronization problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10147) App shouldn't show in HistoryServer web when the event file has been deleted on hdfs

2015-08-20 Thread meiyoula (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

meiyoula updated SPARK-10147:
-
Description: 

It is because *log-replay-executor* thread and *clean log* thread both will 
write value to object *application*, so it has synchronization problem

  was:It is because *log-replay-executor* thread and *clean log* thread both 
will write value to object *application*, so it has synchronization problem


> App shouldn't show in HistoryServer web when the event file has been deleted 
> on hdfs
> 
>
> Key: SPARK-10147
> URL: https://issues.apache.org/jira/browse/SPARK-10147
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: meiyoula
>
> It is because *log-replay-executor* thread and *clean log* thread both will 
> write value to object *application*, so it has synchronization problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10147) App still shows in HistoryServer web when the event file has been deleted on hdfs

2015-08-20 Thread meiyoula (JIRA)
meiyoula created SPARK-10147:


 Summary: App still shows in HistoryServer web when the event file 
has been deleted on hdfs
 Key: SPARK-10147
 URL: https://issues.apache.org/jira/browse/SPARK-10147
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: meiyoula


It is because *log-replay-executor* thread and *clean log* thread both will 
write value to object *application*, so it has synchronization problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10146) Have an easy way to set data source reader/writer specific confs

2015-08-20 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706205#comment-14706205
 ] 

Yin Huai edited comment on SPARK-10146 at 8/21/15 3:42 AM:
---

One possible way is that every data source defines a list of confs that can be 
applied to its reader/writer and we let users set those confs in SQLConf or 
through data source options. Then, we propagate those confs to the 
reader/writer.


was (Author: yhuai):
One possible way to do it is that every data source defines a list of confs 
that can be applied to its reader/writer and we let users set those confs in 
SQLConf or through data source options. Then, we propagate those confs to the 
reader/writer.

> Have an easy way to set data source reader/writer specific confs
> 
>
> Key: SPARK-10146
> URL: https://issues.apache.org/jira/browse/SPARK-10146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>Priority: Critical
>
> Right now, it is hard to set data source reader/writer specifics confs 
> correctly (e.g. parquet's row group size). Users need to set those confs in 
> hadoop conf before start the application or through 
> {{org.apache.spark.deploy.SparkHadoopUtil.get.conf}} at runtime. It will be 
> great if we can have an easy to set those confs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10146) Have an easy way to set data source reader/writer specific confs

2015-08-20 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706205#comment-14706205
 ] 

Yin Huai commented on SPARK-10146:
--

One possible way to do it is that every data source defines a list of confs 
that can be applied to its reader/writer and we let users set those confs in 
SQLConf or through data source options. Then, we propagate those confs to the 
reader/writer.

> Have an easy way to set data source reader/writer specific confs
> 
>
> Key: SPARK-10146
> URL: https://issues.apache.org/jira/browse/SPARK-10146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>Priority: Critical
>
> Right now, it is hard to set data source reader/writer specifics confs 
> correctly (e.g. parquet's row group size). Users need to set those confs in 
> hadoop conf before start the application or through 
> {{org.apache.spark.deploy.SparkHadoopUtil.get.conf}} at runtime. It will be 
> great if we can have an easy to set those confs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10146) Have an easy way to set data source reader/writer specific confs

2015-08-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-10146:
-
Issue Type: Improvement  (was: Bug)

> Have an easy way to set data source reader/writer specific confs
> 
>
> Key: SPARK-10146
> URL: https://issues.apache.org/jira/browse/SPARK-10146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>Priority: Critical
>
> Right now, it is hard to set data source reader/writer specifics confs 
> correctly (e.g. parquet's row group size). Users need to set those confs in 
> hadoop conf before start the application or through 
> {{org.apache.spark.deploy.SparkHadoopUtil.get.conf}} at runtime. It will be 
> great if we can have an easy to set those confs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10146) Have an easy way to set data source reader/writer specific confs

2015-08-20 Thread Yin Huai (JIRA)
Yin Huai created SPARK-10146:


 Summary: Have an easy way to set data source reader/writer 
specific confs
 Key: SPARK-10146
 URL: https://issues.apache.org/jira/browse/SPARK-10146
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Yin Huai
Priority: Critical


Right now, it is hard to set data source reader/writer specifics confs 
correctly (e.g. parquet's row group size). Users need to set those confs in 
hadoop conf before start the application or through 
{{org.apache.spark.deploy.SparkHadoopUtil.get.conf}} at runtime. It will be 
great if we can have an easy to set those confs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10122) AttributeError: 'RDD' object has no attribute 'offsetRanges'

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706202#comment-14706202
 ] 

Apache Spark commented on SPARK-10122:
--

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/8347

> AttributeError: 'RDD' object has no attribute 'offsetRanges'
> 
>
> Key: SPARK-10122
> URL: https://issues.apache.org/jira/browse/SPARK-10122
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Streaming
>Reporter: Amit Ramesh
>  Labels: kafka
>
> SPARK-8389 added the offsetRanges interface to Kafka direct streams. This 
> however appears to break when chaining operations after a transform 
> operation. Following is example code that would result in an error (stack 
> trace below). Note that if the 'count()' operation is taken out of the 
> example code then this error does not occur anymore, and the Kafka data is 
> printed.
> {code:title=kafka_test.py|collapse=true}
> from pyspark import SparkContext
> from pyspark.streaming import StreamingContext
> from pyspark.streaming.kafka import KafkaUtils
> def attach_kafka_metadata(kafka_rdd):
> offset_ranges = kafka_rdd.offsetRanges()
> return kafka_rdd
> if __name__ == "__main__":
> sc = SparkContext(appName='kafka-test')
> ssc = StreamingContext(sc, 10)
> kafka_stream = KafkaUtils.createDirectStream(
> ssc,
> [TOPIC],
> kafkaParams={
> 'metadata.broker.list': BROKERS,
> },
> )
> kafka_stream.transform(attach_kafka_metadata).count().pprint()
> ssc.start()
> ssc.awaitTermination()
> {code}
> {code:title=Stack trace|collapse=true}
> Traceback (most recent call last):
>   File "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/util.py", 
> line 62, in call
> r = self.func(t, *rdds)
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 332, in 
> func = lambda t, rdd: oldfunc(rdd)
>   File "/home/spark/ad_realtime/batch/kafka_test.py", line 7, in 
> attach_kafka_metadata
> offset_ranges = kafka_rdd.offsetRanges()
> AttributeError: 'RDD' object has no attribute 'offsetRanges'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10143) Parquet changed the behavior of calculating splits

2015-08-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-10143:
-
Component/s: SQL

> Parquet changed the behavior of calculating splits
> --
>
> Key: SPARK-10143
> URL: https://issues.apache.org/jira/browse/SPARK-10143
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Critical
>
> When Parquet's task side metadata is enabled (by default it is enabled and it 
> needs to be enabled to deal with tables with many files), Parquet delegates 
> the work of calculating initial splits to FileInputFormat (see 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L301-L311).
>  If filesystem's block size is smaller than the row group size and users do 
> not set min split size, splits in the initial split list will have lots of 
> dummy splits and they contribute to empty tasks (because the starting point 
> and ending point of a split does not cover the starting point of a row 
> group). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10145) Executor exit without useful messages when spark runs in spark-streaming

2015-08-20 Thread Baogang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706139#comment-14706139
 ] 

Baogang Wang edited comment on SPARK-10145 at 8/21/15 3:27 AM:
---

spark.serializer org.apache.spark.serializer.KryoSerializer
spark.akka.frameSize1024
spark.driver.extraJavaOptions   -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions  -Dhdp.version=2.2.0.0–2041
spark.akka.timeout  900
spark.storage.memoryFraction0.4
spark.rdd.compress  true
spark.shuffle.blockTransferService  nio
spark.yarn.executor.memoryOverhead  1024


was (Author: heayin):
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled   true
# spark.eventLog.dir   hdfs://namenode:8021/directory
spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory  5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
#spark.core.connection.ack.wait.timeout 3600
#spark.core.connection.auth.wait.timeout3600
spark.akka.frameSize1024
spark.driver.extraJavaOptions   -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions  -Dhdp.version=2.2.0.0–2041
spark.akka.timeout  900
spark.storage.memoryFraction0.4
spark.rdd.compress  true
spark.shuffle.blockTransferService  nio
spark.yarn.executor.memoryOverhead  1024

> Executor exit without useful messages when spark runs in spark-streaming
> 
>
> Key: SPARK-10145
> URL: https://issues.apache.org/jira/browse/SPARK-10145
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming, YARN
> Environment: spark 1.3.1, hadoop 2.6.0, 6 nodes, each node has 32 
> cores and 32g memory  
>Reporter: Baogang Wang
>Priority: Critical
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Each node is allocated 30g memory by Yarn.
> My application receives messages from Kafka by directstream. Each application 
> consists of 4 dstream window
> Spark application is submitted by this command:
> spark-submit --class spark_security.safe.SafeSockPuppet  --driver-memory 3g 
> --executor-memory 3g --num-executors 3 --executor-cores 4  --name 
> safeSparkDealerUser --master yarn  --deploy-mode cluster  
> spark_Security-1.0-SNAPSHOT.jar.nocalse 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/spark_properties/safedealer.properties
> After about 1 hours, some executor exits. There is no more yarn logs after 
> the executor exits and there is no stack when the executor exits.
> When I see the yarn node manager log, it shows as follows :
> 2015-08-17 17:25:41,550 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1439803298368_0005_01_01 by user root
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1439803298368_0005
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root   
> IP=172.19.160.102   OPERATION=Start Container Request   
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1439803298368_0005
> CONTAINERID=container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1439803298368_0005 transitioned from NEW to INITING
> 2015-08-17 17:25:41,552 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Adding container_1439803298368_0005_01_01 to application 
> application_1439803298368_0005
> 2015-08-17 17:25:41,557 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  rollingMonitorInterval is set as -1. The log rolling mornitoring interval is 
> disabled. The logs will be aggregated after this application is finished.
> 2015-08-17 17:25:41,663 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1439803298368_0005 transitioned from INITING to 
> RUNNING
> 2015-08-17 17:25:41,664 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1439803298368_0005_01_01 transitioned from NEW to 
> LOCALIZING
> 2015-08-17 17:25:41,664 INFO 
> org.apache

[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)

2015-08-20 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706197#comment-14706197
 ] 

Xiangrui Meng commented on SPARK-6192:
--

[~srblakcHwak] As I mentioned above, it would be great if you can start with 
some small features or helping review others' PRs. We need to know each other 
before we can plan a GSoC project. This is a good place to start: 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

> Enhance MLlib's Python API (GSoC 2015)
> --
>
> Key: SPARK-6192
> URL: https://issues.apache.org/jira/browse/SPARK-6192
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, MLlib, PySpark
>Reporter: Xiangrui Meng
>Assignee: Manoj Kumar
>  Labels: gsoc, gsoc2015, mentor
>
> This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme 
> is to enhance MLlib's Python API, to make it on par with the Scala/Java API. 
> The main tasks are:
> 1. For all models in MLlib, provide save/load method. This also
> includes save/load in Scala.
> 2. Python API for evaluation metrics.
> 3. Python API for streaming ML algorithms.
> 4. Python API for distributed linear algebra.
> 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use
> customized serialization, making MLLibPythonAPI hard to maintain. It
> would be nice to use the DataFrames for serialization.
> I'll link the JIRAs for each of the tasks.
> Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. 
> The TODO list will be dynamic based on the backlog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10145) Executor exit without useful messages when spark runs in spark-streaming

2015-08-20 Thread Baogang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706195#comment-14706195
 ] 

Baogang Wang commented on SPARK-10145:
--

Streaming batch is 1 second
The width of Windows are 60 seconds, 180 seconds, 300 seconds and 600 seconds

> Executor exit without useful messages when spark runs in spark-streaming
> 
>
> Key: SPARK-10145
> URL: https://issues.apache.org/jira/browse/SPARK-10145
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming, YARN
> Environment: spark 1.3.1, hadoop 2.6.0, 6 nodes, each node has 32 
> cores and 32g memory  
>Reporter: Baogang Wang
>Priority: Critical
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Each node is allocated 30g memory by Yarn.
> My application receives messages from Kafka by directstream. Each application 
> consists of 4 dstream window
> Spark application is submitted by this command:
> spark-submit --class spark_security.safe.SafeSockPuppet  --driver-memory 3g 
> --executor-memory 3g --num-executors 3 --executor-cores 4  --name 
> safeSparkDealerUser --master yarn  --deploy-mode cluster  
> spark_Security-1.0-SNAPSHOT.jar.nocalse 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/spark_properties/safedealer.properties
> After about 1 hours, some executor exits. There is no more yarn logs after 
> the executor exits and there is no stack when the executor exits.
> When I see the yarn node manager log, it shows as follows :
> 2015-08-17 17:25:41,550 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1439803298368_0005_01_01 by user root
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1439803298368_0005
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root   
> IP=172.19.160.102   OPERATION=Start Container Request   
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1439803298368_0005
> CONTAINERID=container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1439803298368_0005 transitioned from NEW to INITING
> 2015-08-17 17:25:41,552 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Adding container_1439803298368_0005_01_01 to application 
> application_1439803298368_0005
> 2015-08-17 17:25:41,557 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  rollingMonitorInterval is set as -1. The log rolling mornitoring interval is 
> disabled. The logs will be aggregated after this application is finished.
> 2015-08-17 17:25:41,663 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1439803298368_0005 transitioned from INITING to 
> RUNNING
> 2015-08-17 17:25:41,664 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1439803298368_0005_01_01 transitioned from NEW to 
> LOCALIZING
> 2015-08-17 17:25:41,664 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_INIT for appId application_1439803298368_0005
> 2015-08-17 17:25:41,664 INFO 
> org.apache.spark.network.yarn.YarnShuffleService: Initializing container 
> container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,665 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark-assembly-1.3.1-hadoop2.6.0.jar
>  transitioned from INIT to DOWNLOADING
> 2015-08-17 17:25:41,665 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark_Security-1.0-SNAPSHOT.jar
>  transitioned from INIT to DOWNLOADING
> 2015-08-17 17:25:41,665 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,668 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Writing credentials to the nmPrivate file 
> /export/servers/hadoop2.6.0/tmp/nm-local-dir/nmPrivate/container_1439803298368_0005_01_01.tokens.
>  Credentials list: 
> 2015-08-17 17:25:41,682 INFO 
> org.apache.hadoop.ya

[jira] [Updated] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-6192:
-
Target Version/s:   (was: 1.5.0)

> Enhance MLlib's Python API (GSoC 2015)
> --
>
> Key: SPARK-6192
> URL: https://issues.apache.org/jira/browse/SPARK-6192
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, MLlib, PySpark
>Reporter: Xiangrui Meng
>Assignee: Manoj Kumar
>  Labels: gsoc, gsoc2015, mentor
>
> This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme 
> is to enhance MLlib's Python API, to make it on par with the Scala/Java API. 
> The main tasks are:
> 1. For all models in MLlib, provide save/load method. This also
> includes save/load in Scala.
> 2. Python API for evaluation metrics.
> 3. Python API for streaming ML algorithms.
> 4. Python API for distributed linear algebra.
> 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use
> customized serialization, making MLLibPythonAPI hard to maintain. It
> would be nice to use the DataFrames for serialization.
> I'll link the JIRAs for each of the tasks.
> Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. 
> The TODO list will be dynamic based on the backlog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-6192:
-
Target Version/s: 1.5.0

> Enhance MLlib's Python API (GSoC 2015)
> --
>
> Key: SPARK-6192
> URL: https://issues.apache.org/jira/browse/SPARK-6192
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, MLlib, PySpark
>Reporter: Xiangrui Meng
>Assignee: Manoj Kumar
>  Labels: gsoc, gsoc2015, mentor
>
> This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme 
> is to enhance MLlib's Python API, to make it on par with the Scala/Java API. 
> The main tasks are:
> 1. For all models in MLlib, provide save/load method. This also
> includes save/load in Scala.
> 2. Python API for evaluation metrics.
> 3. Python API for streaming ML algorithms.
> 4. Python API for distributed linear algebra.
> 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use
> customized serialization, making MLLibPythonAPI hard to maintain. It
> would be nice to use the DataFrames for serialization.
> I'll link the JIRAs for each of the tasks.
> Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. 
> The TODO list will be dynamic based on the backlog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)

2015-08-20 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706191#comment-14706191
 ] 

Xiangrui Meng commented on SPARK-6192:
--

Not yet, officially.

> Enhance MLlib's Python API (GSoC 2015)
> --
>
> Key: SPARK-6192
> URL: https://issues.apache.org/jira/browse/SPARK-6192
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, MLlib, PySpark
>Reporter: Xiangrui Meng
>Assignee: Manoj Kumar
>  Labels: gsoc, gsoc2015, mentor
>
> This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme 
> is to enhance MLlib's Python API, to make it on par with the Scala/Java API. 
> The main tasks are:
> 1. For all models in MLlib, provide save/load method. This also
> includes save/load in Scala.
> 2. Python API for evaluation metrics.
> 3. Python API for streaming ML algorithms.
> 4. Python API for distributed linear algebra.
> 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use
> customized serialization, making MLLibPythonAPI hard to maintain. It
> would be nice to use the DataFrames for serialization.
> I'll link the JIRAs for each of the tasks.
> Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. 
> The TODO list will be dynamic based on the backlog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8530) Add Python API for MinMaxScaler

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8530:
-
Component/s: PySpark

> Add Python API for MinMaxScaler
> ---
>
> Key: SPARK-8530
> URL: https://issues.apache.org/jira/browse/SPARK-8530
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: yuhao yang
>Assignee: yuhao yang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8050) Make Savable and Loader Java-friendly.

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8050:
-
Target Version/s: 1.6.0  (was: 1.5.0)

> Make Savable and Loader Java-friendly.
> --
>
> Key: SPARK-8050
> URL: https://issues.apache.org/jira/browse/SPARK-8050
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Minor
>
> Should overload save/load to accept JavaSparkContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8530) Add Python API for MinMaxScaler

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8530:
-
Target Version/s: 1.6.0  (was: 1.5.0)

> Add Python API for MinMaxScaler
> ---
>
> Key: SPARK-8530
> URL: https://issues.apache.org/jira/browse/SPARK-8530
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: yuhao yang
>Assignee: yuhao yang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10122) AttributeError: 'RDD' object has no attribute 'offsetRanges'

2015-08-20 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706186#comment-14706186
 ] 

Saisai Shao commented on SPARK-10122:
-

Hi [~aramesh], thanks a lot for pointing this out. This is actually a bug, 
sorry for not covering it in the unit test.

The problem is Python will compact a series of {{TransformedDStream}} into one:

{code}
if (isinstance(prev, TransformedDStream) and
not prev.is_cached and not prev.is_checkpointed):
prev_func = prev.func
self.func = lambda t, rdd: func(t, prev_func(t, rdd))
self.prev = prev.prev
{code}

As {{KafkaTransformedDStream}} is a subclass of {{TransformedDStream}}, so it 
will be compacted to replace with its parent DStream, as the code shows 
{{self.prev = prev.prev}}, which is a DStream, get offset ranges on DStream 
will throw an exception as you mentioned before.

I will submit a PR to fix this, so you could try with the patch to see if it is 
fixed.

> AttributeError: 'RDD' object has no attribute 'offsetRanges'
> 
>
> Key: SPARK-10122
> URL: https://issues.apache.org/jira/browse/SPARK-10122
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Streaming
>Reporter: Amit Ramesh
>  Labels: kafka
>
> SPARK-8389 added the offsetRanges interface to Kafka direct streams. This 
> however appears to break when chaining operations after a transform 
> operation. Following is example code that would result in an error (stack 
> trace below). Note that if the 'count()' operation is taken out of the 
> example code then this error does not occur anymore, and the Kafka data is 
> printed.
> {code:title=kafka_test.py|collapse=true}
> from pyspark import SparkContext
> from pyspark.streaming import StreamingContext
> from pyspark.streaming.kafka import KafkaUtils
> def attach_kafka_metadata(kafka_rdd):
> offset_ranges = kafka_rdd.offsetRanges()
> return kafka_rdd
> if __name__ == "__main__":
> sc = SparkContext(appName='kafka-test')
> ssc = StreamingContext(sc, 10)
> kafka_stream = KafkaUtils.createDirectStream(
> ssc,
> [TOPIC],
> kafkaParams={
> 'metadata.broker.list': BROKERS,
> },
> )
> kafka_stream.transform(attach_kafka_metadata).count().pprint()
> ssc.start()
> ssc.awaitTermination()
> {code}
> {code:title=Stack trace|collapse=true}
> Traceback (most recent call last):
>   File "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/util.py", 
> line 62, in call
> r = self.func(t, *rdds)
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File 
> "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/dstream.py", line 
> 616, in 
> self.func = lambda t, rdd: func(t, prev_func(t, rdd))
>   File "/home/spark/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", 
> line 332, in 
> func = lambda t, rdd: oldfunc(rdd)
>   File "/home/spark/ad_realtime/batch/kafka_test.py", line 7, in 
> attach_kafka_metadata
> offset_ranges = kafka_rdd.offsetRanges()
> AttributeError: 'RDD' object has no attribute 'offsetRanges'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8854) Documentation for Association Rules

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-8854.
--
Resolution: Duplicate

> Documentation for Association Rules
> ---
>
> Key: SPARK-8854
> URL: https://issues.apache.org/jira/browse/SPARK-8854
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> Documentation describing how to generate association rules from frequent 
> itemsets needs to be provided. The relevant method is 
> {{FPGrowthModel.generateAssociationRules}}. This will likely be added to the 
> existing section for frequent-itemsets using FPGrowth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8521) Feature Transformers in 1.5

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8521:
-
Fix Version/s: 1.5.0

> Feature Transformers in 1.5
> ---
>
> Key: SPARK-8521
> URL: https://issues.apache.org/jira/browse/SPARK-8521
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
> Fix For: 1.5.0
>
>
> This is a list of feature transformers we plan to add in Spark 1.5. Feel free 
> to propose useful transformers that are not on the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7455) Perf test for LDA (EM/online)

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-7455:
-
Fix Version/s: 1.5.0

> Perf test for LDA (EM/online)
> -
>
> Key: SPARK-7455
> URL: https://issues.apache.org/jira/browse/SPARK-7455
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>Assignee: yuhao yang
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7536) Audit MLlib Python API for 1.4

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-7536:
-
Fix Version/s: 1.5.0

> Audit MLlib Python API for 1.4
> --
>
> Key: SPARK-7536
> URL: https://issues.apache.org/jira/browse/SPARK-7536
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
> Fix For: 1.5.0
>
>
> **NOTE: This is targeted at 1.5.0 because it has so many useful links for 
> JIRAs targeted for 1.5.0.  In the future, we should create a _new_ JIRA for 
> linking future items.**
> For new public APIs added to MLlib, we need to check the generated HTML doc 
> and compare the Scala & Python versions.  We need to track:
> * Inconsistency: Do class/method/parameter names match? SPARK-7667
> * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
> be as complete as the Scala doc. [SPARK-7666], [SPARK-6173]
> * API breaking changes: These should be very rare but are occasionally either 
> necessary (intentional) or accidental.  These must be recorded and added in 
> the Migration Guide for this release. SPARK-7665
> ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
> component, please note that as well.
> * Missing classes/methods/parameters: We should create to-do JIRAs for 
> functionality missing from Python.
> ** classification
> *** StreamingLogisticRegressionWithSGD SPARK-7633
> ** clustering
> *** GaussianMixture SPARK-6258
> *** LDA SPARK-6259
> *** Power Iteration Clustering SPARK-5962
> *** StreamingKMeans SPARK-4118 
> ** evaluation
> *** MultilabelMetrics SPARK-6094 
> ** feature
> *** ElementwiseProduct SPARK-7605
> *** PCA SPARK-7604
> ** linalg
> *** Distributed linear algebra SPARK-6100
> ** pmml.export SPARK-7638
> ** regression
> *** StreamingLinearRegressionWithSGD SPARK-4127
> ** stat
> *** KernelDensity SPARK-7639
> ** util
> *** MLUtils SPARK-6263 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8633) List missing model methods in Python Pipeline API

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8633:
-
Fix Version/s: 1.5.0

> List missing model methods in Python Pipeline API
> -
>
> Key: SPARK-8633
> URL: https://issues.apache.org/jira/browse/SPARK-8633
> Project: Spark
>  Issue Type: Task
>  Components: ML, PySpark
>Reporter: Xiangrui Meng
>Assignee: Manoj Kumar
> Fix For: 1.5.0
>
>
> Most Python models under the pipeline API are implemented as JavaModel 
> wrappers. However, we didn't provide methods to extract information from 
> model. In SPARK-7647, we added weights and intercept to linear models. This 
> JIRA is to list all missing model methods, create JIRAs for each, and link 
> them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9712) List source compatibility issues in Scala API from scaladocs

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-9712:
-
Fix Version/s: 1.5.0

> List source compatibility issues in Scala API from scaladocs
> 
>
> Key: SPARK-9712
> URL: https://issues.apache.org/jira/browse/SPARK-9712
> Project: Spark
>  Issue Type: Task
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Assignee: Feynman Liang
> Fix For: 1.5.0
>
> Attachments: scaladoc-compare.diff
>
>
> Generate raw scaladocs and use {{scala/tools/scaladoc-compare}} to show 
> changes to public APIs and documentations. These results are access-modifier 
> aware since they run on the Scala source rather than generated classfiles, 
> but will include documentation changes which may not affect behavior.
> Results attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9706) List Binary and Source Compatibility Issues with japi-compliance checker

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-9706:
-
Fix Version/s: 1.5.0

> List Binary and Source Compatibility Issues with japi-compliance checker
> 
>
> Key: SPARK-9706
> URL: https://issues.apache.org/jira/browse/SPARK-9706
> Project: Spark
>  Issue Type: Task
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Assignee: Feynman Liang
> Fix For: 1.5.0
>
> Attachments: compat_reports.zip
>
>
> To identify potential API issues, list public API changes which affect binary 
> and source incompatibility by using command:
> {code}
> japi-compliance-checker spark-mllib_2.10-1.4.2-SNAPSHOT.jar 
> spark-mllib_2.10-1.5.0-SNAPSHOT.jar
> {code}
> Report result attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8400) ml.ALS doesn't handle -1 block size

2015-08-20 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706182#comment-14706182
 ] 

Xiangrui Meng commented on SPARK-8400:
--

Sorry for my late reply! We check numBlocks in LocalIndexEncoder. However, I'm 
not sure whether this happens before any data shuffling. It might be better to 
check numUserBlocks and numItemBlocks directly.

> ml.ALS doesn't handle -1 block size
> ---
>
> Key: SPARK-8400
> URL: https://issues.apache.org/jira/browse/SPARK-8400
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.3.1
>Reporter: Xiangrui Meng
>Assignee: Bryan Cutler
>
> Under spark.mllib, if number blocks is set to -1, we set the block size 
> automatically based on the input partition size. However, this behavior is 
> not preserved in the spark.ml API. If user sets -1 in Spark 1.3, it will not 
> work, but no error messages will show.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10137) Avoid to restart receivers if scheduleReceivers returns balanced results

2015-08-20 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-10137:
--
Assignee: Shixiong Zhu

> Avoid to restart receivers if scheduleReceivers returns balanced results
> 
>
> Key: SPARK-10137
> URL: https://issues.apache.org/jira/browse/SPARK-10137
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Critical
>
> In some cases, even if scheduleReceivers returns balanced results, 
> ReceiverTracker still may reject some receivers and force them to restart. 
> See my PR for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10137) Avoid to restart receivers if scheduleReceivers returns balanced results

2015-08-20 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-10137:
--
Priority: Critical  (was: Major)

> Avoid to restart receivers if scheduleReceivers returns balanced results
> 
>
> Key: SPARK-10137
> URL: https://issues.apache.org/jira/browse/SPARK-10137
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Shixiong Zhu
>Priority: Critical
>
> In some cases, even if scheduleReceivers returns balanced results, 
> ReceiverTracker still may reject some receivers and force them to restart. 
> See my PR for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9848) Add @Since annotation to new public APIs in 1.5

2015-08-20 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706178#comment-14706178
 ] 

Xiangrui Meng edited comment on SPARK-9848 at 8/21/15 3:10 AM:
---

No, that would be too much for this release. We plan to do that after 1.5. If 
you cannot find more new public APIs under spark.mllib, we can mark this as 
resolved.


was (Author: mengxr):
No, that would be too much for this release. We plan to do that after 1.5.

> Add @Since annotation to new public APIs in 1.5
> ---
>
> Key: SPARK-9848
> URL: https://issues.apache.org/jira/browse/SPARK-9848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, MLlib
>Reporter: Xiangrui Meng
>Assignee: Manoj Kumar
>Priority: Critical
>  Labels: starter
>
> We should get a list of new APIs from SPARK-9660. cc: [~fliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8665) Update ALS documentation to include performance tips

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8665:
-
Target Version/s: 1.6.0  (was: 1.5.0)

> Update ALS documentation to include performance tips
> 
>
> Key: SPARK-8665
> URL: https://issues.apache.org/jira/browse/SPARK-8665
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> With the new ALS implementation, users still need to deal with 
> computation/communication trade-offs. It would be nice to document this 
> clearly based on the issues on the mailing list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9848) Add @Since annotation to new public APIs in 1.5

2015-08-20 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706178#comment-14706178
 ] 

Xiangrui Meng commented on SPARK-9848:
--

No, that would be too much for this release. We plan to do that after 1.5.

> Add @Since annotation to new public APIs in 1.5
> ---
>
> Key: SPARK-9848
> URL: https://issues.apache.org/jira/browse/SPARK-9848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, MLlib
>Reporter: Xiangrui Meng
>Assignee: Manoj Kumar
>Priority: Critical
>  Labels: starter
>
> We should get a list of new APIs from SPARK-9660. cc: [~fliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9846) User guide for Multilayer Perceptron Classifier

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-9846.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8262
[https://github.com/apache/spark/pull/8262]

> User guide for Multilayer Perceptron Classifier
> ---
>
> Key: SPARK-9846
> URL: https://issues.apache.org/jira/browse/SPARK-9846
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Alexander Ulanov
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10140) Add target fields to @Since annotation

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-10140.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8344
[https://github.com/apache/spark/pull/8344]

> Add target fields to @Since annotation
> --
>
> Key: SPARK-10140
> URL: https://issues.apache.org/jira/browse/SPARK-10140
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
> Fix For: 1.5.0
>
>
> Add target fields to @Since so constructor params and fields also get 
> annotated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9228) Combine unsafe and codegen into a single option

2015-08-20 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706152#comment-14706152
 ] 

Yi Zhou commented on SPARK-9228:


Hi [~davies]
After introducing the 'spark.sql.tungsten.enabled' , it mean that the previous 
2 settings(spark.sql.unsafe.enabled and spark.sql.codegen) will both deprecated 
or removed ,right ?  But currently i still can show the parameters in Spark SQL 
CLI like below:

15/08/21 10:28:54 INFO DAGScheduler: Job 6 finished: processCmd at 
CliDriver.java:376, took 0.191960 s
spark.sql.unsafe.enabledtrue
Time taken: 0.253 seconds, Fetched 1 row(s)

15/08/21 10:34:10 INFO DAGScheduler: Job 7 finished: processCmd at 
CliDriver.java:376, took 0.284666 s
spark.sql.codegen   true
Time taken: 0.336 seconds, Fetched 1 row(s)
15/08/21 10:34:10 INFO CliDriver: Time taken: 0.336 seconds, Fetched 1 row(s)



> Combine unsafe and codegen into a single option
> ---
>
> Key: SPARK-9228
> URL: https://issues.apache.org/jira/browse/SPARK-9228
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Before QA, lets flip on features and consolidate unsafe and codegen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10145) Executor exit without useful messages when spark runs in spark-streaming

2015-08-20 Thread Baogang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706143#comment-14706143
 ] 

Baogang Wang commented on SPARK-10145:
--

 4 other  applications run at the same time and each of them has the same count 
of executor. And 3g memory is allocated to each executor  

> Executor exit without useful messages when spark runs in spark-streaming
> 
>
> Key: SPARK-10145
> URL: https://issues.apache.org/jira/browse/SPARK-10145
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming, YARN
> Environment: spark 1.3.1, hadoop 2.6.0, 6 nodes, each node has 32 
> cores and 32g memory  
>Reporter: Baogang Wang
>Priority: Critical
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Each node is allocated 30g memory by Yarn.
> My application receives messages from Kafka by directstream. Each application 
> consists of 4 dstream window
> Spark application is submitted by this command:
> spark-submit --class spark_security.safe.SafeSockPuppet  --driver-memory 3g 
> --executor-memory 3g --num-executors 3 --executor-cores 4  --name 
> safeSparkDealerUser --master yarn  --deploy-mode cluster  
> spark_Security-1.0-SNAPSHOT.jar.nocalse 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/spark_properties/safedealer.properties
> After about 1 hours, some executor exits. There is no more yarn logs after 
> the executor exits and there is no stack when the executor exits.
> When I see the yarn node manager log, it shows as follows :
> 2015-08-17 17:25:41,550 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1439803298368_0005_01_01 by user root
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1439803298368_0005
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root   
> IP=172.19.160.102   OPERATION=Start Container Request   
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1439803298368_0005
> CONTAINERID=container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1439803298368_0005 transitioned from NEW to INITING
> 2015-08-17 17:25:41,552 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Adding container_1439803298368_0005_01_01 to application 
> application_1439803298368_0005
> 2015-08-17 17:25:41,557 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  rollingMonitorInterval is set as -1. The log rolling mornitoring interval is 
> disabled. The logs will be aggregated after this application is finished.
> 2015-08-17 17:25:41,663 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1439803298368_0005 transitioned from INITING to 
> RUNNING
> 2015-08-17 17:25:41,664 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1439803298368_0005_01_01 transitioned from NEW to 
> LOCALIZING
> 2015-08-17 17:25:41,664 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_INIT for appId application_1439803298368_0005
> 2015-08-17 17:25:41,664 INFO 
> org.apache.spark.network.yarn.YarnShuffleService: Initializing container 
> container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,665 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark-assembly-1.3.1-hadoop2.6.0.jar
>  transitioned from INIT to DOWNLOADING
> 2015-08-17 17:25:41,665 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark_Security-1.0-SNAPSHOT.jar
>  transitioned from INIT to DOWNLOADING
> 2015-08-17 17:25:41,665 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,668 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Writing credentials to the nmPrivate file 
> /export/servers/hadoop2.6.0/tmp/nm-local-dir/nmPrivate/container_1439803298368_0005_01_01.tokens.
>  Credentials list: 
> 2015-08-17 17:25:4

[jira] [Comment Edited] (SPARK-10145) Executor exit without useful messages when spark runs in spark-streaming

2015-08-20 Thread Baogang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706139#comment-14706139
 ] 

Baogang Wang edited comment on SPARK-10145 at 8/21/15 2:34 AM:
---

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled   true
# spark.eventLog.dir   hdfs://namenode:8021/directory
spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory  5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
#spark.core.connection.ack.wait.timeout 3600
#spark.core.connection.auth.wait.timeout3600
spark.akka.frameSize1024
spark.driver.extraJavaOptions   -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions  -Dhdp.version=2.2.0.0–2041
spark.akka.timeout  900
spark.storage.memoryFraction0.4
spark.rdd.compress  true
spark.shuffle.blockTransferService  nio
spark.yarn.executor.memoryOverhead  1024


was (Author: heayin):
the spark-defaults.conf is as follows:
 Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled   true
# spark.eventLog.dir   hdfs://namenode:8021/directory
spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory  5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
#spark.core.connection.ack.wait.timeout 3600
#spark.core.connection.auth.wait.timeout3600
spark.akka.frameSize1024
spark.driver.extraJavaOptions   -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions  -Dhdp.version=2.2.0.0–2041
spark.akka.timeout  900
spark.storage.memoryFraction0.4
spark.rdd.compress  true
spark.shuffle.blockTransferService  nio
spark.yarn.executor.memoryOverhead  1024

> Executor exit without useful messages when spark runs in spark-streaming
> 
>
> Key: SPARK-10145
> URL: https://issues.apache.org/jira/browse/SPARK-10145
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming, YARN
> Environment: spark 1.3.1, hadoop 2.6.0, 6 nodes, each node has 32 
> cores and 32g memory  
>Reporter: Baogang Wang
>Priority: Critical
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Each node is allocated 30g memory by Yarn.
> My application receives messages from Kafka by directstream. Each application 
> consists of 4 dstream window
> Spark application is submitted by this command:
> spark-submit --class spark_security.safe.SafeSockPuppet  --driver-memory 3g 
> --executor-memory 3g --num-executors 3 --executor-cores 4  --name 
> safeSparkDealerUser --master yarn  --deploy-mode cluster  
> spark_Security-1.0-SNAPSHOT.jar.nocalse 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/spark_properties/safedealer.properties
> After about 1 hours, some executor exits. There is no more yarn logs after 
> the executor exits and there is no stack when the executor exits.
> When I see the yarn node manager log, it shows as follows :
> 2015-08-17 17:25:41,550 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1439803298368_0005_01_01 by user root
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1439803298368_0005
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root   
> IP=172.19.160.102   OPERATION=Start Container Request   
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1439803298368_0005
> CONTAINERID=container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1439803298368_0005 transitioned from NEW to INITING
> 2015-08-17 17:25:41,552 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Adding container_1439803298368_0005_01_01 to application 
> application_1439803298368_0005
> 2015-08-17 17:25:41,557 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  rollingMonitorInterval is set as -1. The 

[jira] [Commented] (SPARK-10145) Executor exit without useful messages when spark runs in spark-streaming

2015-08-20 Thread Baogang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706139#comment-14706139
 ] 

Baogang Wang commented on SPARK-10145:
--

the spark-defaults.conf is as follows:
 Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled   true
# spark.eventLog.dir   hdfs://namenode:8021/directory
spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory  5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
#spark.core.connection.ack.wait.timeout 3600
#spark.core.connection.auth.wait.timeout3600
spark.akka.frameSize1024
spark.driver.extraJavaOptions   -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions  -Dhdp.version=2.2.0.0–2041
spark.akka.timeout  900
spark.storage.memoryFraction0.4
spark.rdd.compress  true
spark.shuffle.blockTransferService  nio
spark.yarn.executor.memoryOverhead  1024

> Executor exit without useful messages when spark runs in spark-streaming
> 
>
> Key: SPARK-10145
> URL: https://issues.apache.org/jira/browse/SPARK-10145
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming, YARN
> Environment: spark 1.3.1, hadoop 2.6.0, 6 nodes, each node has 32 
> cores and 32g memory  
>Reporter: Baogang Wang
>Priority: Critical
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Each node is allocated 30g memory by Yarn.
> My application receives messages from Kafka by directstream. Each application 
> consists of 4 dstream window
> Spark application is submitted by this command:
> spark-submit --class spark_security.safe.SafeSockPuppet  --driver-memory 3g 
> --executor-memory 3g --num-executors 3 --executor-cores 4  --name 
> safeSparkDealerUser --master yarn  --deploy-mode cluster  
> spark_Security-1.0-SNAPSHOT.jar.nocalse 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/spark_properties/safedealer.properties
> After about 1 hours, some executor exits. There is no more yarn logs after 
> the executor exits and there is no stack when the executor exits.
> When I see the yarn node manager log, it shows as follows :
> 2015-08-17 17:25:41,550 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1439803298368_0005_01_01 by user root
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1439803298368_0005
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root   
> IP=172.19.160.102   OPERATION=Start Container Request   
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1439803298368_0005
> CONTAINERID=container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,551 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1439803298368_0005 transitioned from NEW to INITING
> 2015-08-17 17:25:41,552 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Adding container_1439803298368_0005_01_01 to application 
> application_1439803298368_0005
> 2015-08-17 17:25:41,557 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  rollingMonitorInterval is set as -1. The log rolling mornitoring interval is 
> disabled. The logs will be aggregated after this application is finished.
> 2015-08-17 17:25:41,663 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1439803298368_0005 transitioned from INITING to 
> RUNNING
> 2015-08-17 17:25:41,664 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1439803298368_0005_01_01 transitioned from NEW to 
> LOCALIZING
> 2015-08-17 17:25:41,664 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_INIT for appId application_1439803298368_0005
> 2015-08-17 17:25:41,664 INFO 
> org.apache.spark.network.yarn.YarnShuffleService: Initializing container 
> container_1439803298368_0005_01_01
> 2015-08-17 17:25:41,665 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/

[jira] [Created] (SPARK-10145) Executor exit without useful messages when spark runs in spark-streaming

2015-08-20 Thread Baogang Wang (JIRA)
Baogang Wang created SPARK-10145:


 Summary: Executor exit without useful messages when spark runs in 
spark-streaming
 Key: SPARK-10145
 URL: https://issues.apache.org/jira/browse/SPARK-10145
 Project: Spark
  Issue Type: Bug
  Components: Streaming, YARN
 Environment: spark 1.3.1, hadoop 2.6.0, 6 nodes, each node has 32 
cores and 32g memory  
Reporter: Baogang Wang
Priority: Critical


Each node is allocated 30g memory by Yarn.
My application receives messages from Kafka by directstream. Each application 
consists of 4 dstream window
Spark application is submitted by this command:
spark-submit --class spark_security.safe.SafeSockPuppet  --driver-memory 3g 
--executor-memory 3g --num-executors 3 --executor-cores 4  --name 
safeSparkDealerUser --master yarn  --deploy-mode cluster  
spark_Security-1.0-SNAPSHOT.jar.nocalse 
hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/spark_properties/safedealer.properties

After about 1 hours, some executor exits. There is no more yarn logs after the 
executor exits and there is no stack when the executor exits.
When I see the yarn node manager log, it shows as follows :


2015-08-17 17:25:41,550 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Start request for container_1439803298368_0005_01_01 by user root
2015-08-17 17:25:41,551 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Creating a new application reference for app application_1439803298368_0005
2015-08-17 17:25:41,551 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root 
IP=172.19.160.102   OPERATION=Start Container Request   
TARGET=ContainerManageImpl  RESULT=SUCCESS  
APPID=application_1439803298368_0005
CONTAINERID=container_1439803298368_0005_01_01
2015-08-17 17:25:41,551 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1439803298368_0005 transitioned from NEW to INITING
2015-08-17 17:25:41,552 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Adding container_1439803298368_0005_01_01 to application 
application_1439803298368_0005
2015-08-17 17:25:41,557 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 rollingMonitorInterval is set as -1. The log rolling mornitoring interval is 
disabled. The logs will be aggregated after this application is finished.
2015-08-17 17:25:41,663 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1439803298368_0005 transitioned from INITING to RUNNING
2015-08-17 17:25:41,664 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1439803298368_0005_01_01 transitioned from NEW to 
LOCALIZING
2015-08-17 17:25:41,664 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
event CONTAINER_INIT for appId application_1439803298368_0005
2015-08-17 17:25:41,664 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Initializing container container_1439803298368_0005_01_01
2015-08-17 17:25:41,665 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark-assembly-1.3.1-hadoop2.6.0.jar
 transitioned from INIT to DOWNLOADING
2015-08-17 17:25:41,665 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark_Security-1.0-SNAPSHOT.jar
 transitioned from INIT to DOWNLOADING
2015-08-17 17:25:41,665 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_1439803298368_0005_01_01
2015-08-17 17:25:41,668 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Writing credentials to the nmPrivate file 
/export/servers/hadoop2.6.0/tmp/nm-local-dir/nmPrivate/container_1439803298368_0005_01_01.tokens.
 Credentials list: 
2015-08-17 17:25:41,682 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
Initializing user root
2015-08-17 17:25:41,686 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying 
from 
/export/servers/hadoop2.6.0/tmp/nm-local-dir/nmPrivate/container_1439803298368_0005_01_01.tokens
 to 
/export/servers/hadoop2.6.0/tmp/nm-local-dir/usercache/root/appcache/application_1439803298368_0005/container_1439803298368_0005_01_01.tokens
2015-08-17 17:25:41,686 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer 
CWD set t

[jira] [Assigned] (SPARK-9853) Optimize shuffle fetch of contiguous partition IDs

2015-08-20 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia reassigned SPARK-9853:


Assignee: Matei Zaharia

> Optimize shuffle fetch of contiguous partition IDs
> --
>
> Key: SPARK-9853
> URL: https://issues.apache.org/jira/browse/SPARK-9853
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
>Priority: Minor
>
> On the map side, we should be able to serve a block representing multiple 
> partition IDs in one block manager request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10143) Parquet changed the behavior of calculating splits

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10143:


Assignee: (was: Apache Spark)

> Parquet changed the behavior of calculating splits
> --
>
> Key: SPARK-10143
> URL: https://issues.apache.org/jira/browse/SPARK-10143
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Critical
>
> When Parquet's task side metadata is enabled (by default it is enabled and it 
> needs to be enabled to deal with tables with many files), Parquet delegates 
> the work of calculating initial splits to FileInputFormat (see 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L301-L311).
>  If filesystem's block size is smaller than the row group size and users do 
> not set min split size, splits in the initial split list will have lots of 
> dummy splits and they contribute to empty tasks (because the starting point 
> and ending point of a split does not cover the starting point of a row 
> group). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10143) Parquet changed the behavior of calculating splits

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706045#comment-14706045
 ] 

Apache Spark commented on SPARK-10143:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/8346

> Parquet changed the behavior of calculating splits
> --
>
> Key: SPARK-10143
> URL: https://issues.apache.org/jira/browse/SPARK-10143
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Critical
>
> When Parquet's task side metadata is enabled (by default it is enabled and it 
> needs to be enabled to deal with tables with many files), Parquet delegates 
> the work of calculating initial splits to FileInputFormat (see 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L301-L311).
>  If filesystem's block size is smaller than the row group size and users do 
> not set min split size, splits in the initial split list will have lots of 
> dummy splits and they contribute to empty tasks (because the starting point 
> and ending point of a split does not cover the starting point of a row 
> group). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10143) Parquet changed the behavior of calculating splits

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10143:


Assignee: Apache Spark

> Parquet changed the behavior of calculating splits
> --
>
> Key: SPARK-10143
> URL: https://issues.apache.org/jira/browse/SPARK-10143
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Apache Spark
>Priority: Critical
>
> When Parquet's task side metadata is enabled (by default it is enabled and it 
> needs to be enabled to deal with tables with many files), Parquet delegates 
> the work of calculating initial splits to FileInputFormat (see 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L301-L311).
>  If filesystem's block size is smaller than the row group size and users do 
> not set min split size, splits in the initial split list will have lots of 
> dummy splits and they contribute to empty tasks (because the starting point 
> and ending point of a split does not cover the starting point of a row 
> group). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8580) Add Parquet files generated by different systems to test interoperability and compatibility

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706041#comment-14706041
 ] 

Apache Spark commented on SPARK-8580:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/8341

> Add Parquet files generated by different systems to test interoperability and 
> compatibility
> ---
>
> Key: SPARK-8580
> URL: https://issues.apache.org/jira/browse/SPARK-8580
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> As we are implementing Parquet backwards-compatibility rules for Spark 1.5.0 
> to improve interoperability with other systems (reading non-standard Parquet 
> files they generate, and generating standard Parquet files), it would be good 
> to have a set of standard test Parquet files generated by various 
> systems/tools (parquet-thrift, parquet-avro, parquet-hive, Impala, and old 
> versions of Spark SQL) to ensure compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8580) Add Parquet files generated by different systems to test interoperability and compatibility

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8580:
---

Assignee: Cheng Lian  (was: Apache Spark)

> Add Parquet files generated by different systems to test interoperability and 
> compatibility
> ---
>
> Key: SPARK-8580
> URL: https://issues.apache.org/jira/browse/SPARK-8580
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> As we are implementing Parquet backwards-compatibility rules for Spark 1.5.0 
> to improve interoperability with other systems (reading non-standard Parquet 
> files they generate, and generating standard Parquet files), it would be good 
> to have a set of standard test Parquet files generated by various 
> systems/tools (parquet-thrift, parquet-avro, parquet-hive, Impala, and old 
> versions of Spark SQL) to ensure compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8580) Add Parquet files generated by different systems to test interoperability and compatibility

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8580:
---

Assignee: Apache Spark  (was: Cheng Lian)

> Add Parquet files generated by different systems to test interoperability and 
> compatibility
> ---
>
> Key: SPARK-8580
> URL: https://issues.apache.org/jira/browse/SPARK-8580
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>
> As we are implementing Parquet backwards-compatibility rules for Spark 1.5.0 
> to improve interoperability with other systems (reading non-standard Parquet 
> files they generate, and generating standard Parquet files), it would be good 
> to have a set of standard test Parquet files generated by various 
> systems/tools (parquet-thrift, parquet-avro, parquet-hive, Impala, and old 
> versions of Spark SQL) to ensure compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10144) Actually show peak execution memory on UI by default

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10144:


Assignee: Apache Spark  (was: Andrew Or)

> Actually show peak execution memory on UI by default
> 
>
> Key: SPARK-10144
> URL: https://issues.apache.org/jira/browse/SPARK-10144
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>
> The peak execution memory metric was introduced in SPARK-8735. That was 
> before Tungsten was enabled by default, so it assumed that 
> `spark.sql.unsafe.enabled` must be explicitly set to true. This is no longer 
> the case...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10144) Actually show peak execution memory on UI by default

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706033#comment-14706033
 ] 

Apache Spark commented on SPARK-10144:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/8345

> Actually show peak execution memory on UI by default
> 
>
> Key: SPARK-10144
> URL: https://issues.apache.org/jira/browse/SPARK-10144
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The peak execution memory metric was introduced in SPARK-8735. That was 
> before Tungsten was enabled by default, so it assumed that 
> `spark.sql.unsafe.enabled` must be explicitly set to true. This is no longer 
> the case...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10144) Actually show peak execution memory on UI by default

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10144:


Assignee: Andrew Or  (was: Apache Spark)

> Actually show peak execution memory on UI by default
> 
>
> Key: SPARK-10144
> URL: https://issues.apache.org/jira/browse/SPARK-10144
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The peak execution memory metric was introduced in SPARK-8735. That was 
> before Tungsten was enabled by default, so it assumed that 
> `spark.sql.unsafe.enabled` must be explicitly set to true. This is no longer 
> the case...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10144) Actually show peak execution memory on UI by default

2015-08-20 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-10144:
--
Summary: Actually show peak execution memory on UI by default  (was: 
Actually show peak execution memory by default)

> Actually show peak execution memory on UI by default
> 
>
> Key: SPARK-10144
> URL: https://issues.apache.org/jira/browse/SPARK-10144
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The peak execution memory metric was introduced in SPARK-8735. That was 
> before Tungsten was enabled by default, so it assumed that 
> `spark.sql.unsafe.enabled` must be explicitly set to true. This is no longer 
> the case...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10144) Actually show peak execution memory by default

2015-08-20 Thread Andrew Or (JIRA)
Andrew Or created SPARK-10144:
-

 Summary: Actually show peak execution memory by default
 Key: SPARK-10144
 URL: https://issues.apache.org/jira/browse/SPARK-10144
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.5.0
Reporter: Andrew Or
Assignee: Andrew Or


The peak execution memory metric was introduced in SPARK-8735. That was before 
Tungsten was enabled by default, so it assumed that `spark.sql.unsafe.enabled` 
must be explicitly set to true. This is no longer the case...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8805) Spark shell not working

2015-08-20 Thread Amir Gur (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706025#comment-14706025
 ] 

Amir Gur commented on SPARK-8805:
-

You are right thanks for the pointer.
That was the current latest from the git I was using, it comes with bash 3.x - 
https://github.com/msysgit/msysgit/releases/tag/Git-1.9.5-preview20150319.

Newer one is at https://git-for-windows.github.io/, and 
https://github.com/git-for-windows/git/releases/tag/v2.5.0.windows.1 got:
{quote}
$ bash --version
GNU bash, version 4.3.39(3)-release (x86_64-pc-msys)
Copyright (C) 2013 Free Software Foundation, Inc.
{quote}
And it does not have that issue, which looks good.
But this is still not ok to depend on the bash 4.x without proper message to 
bash 3.x users that bails out right away and prints an appropriate message.

Then now after taking latest On branch branch-1.4 + a successful build/mvn 
clean compile on JDK 1.7 getting the following:
{quote}
$ bin/spark-shell
ls: cannot access /c/dev/github/apache/spark/assembly/target/scala-2.10: No 
such file or directory
Failed to find Spark assembly in 
/c/dev/github/apache/spark/assembly/target/scala-2.10.
You need to build Spark before running this program.
{quote}
Looks like something didn't get built and build is still marked as passed, not 
sure why.  Will further check how to solve it.



> Spark shell not working
> ---
>
> Key: SPARK-8805
> URL: https://issues.apache.org/jira/browse/SPARK-8805
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Spark Core, Windows
>Reporter: Perinkulam I Ganesh
>
> I am using Git Bash on windows.  Installed Open jdk1.8.0_45 and spark 1.4.0
> I am able to build spark and install it. But when ever I execute spark shell 
> it gives me the following error:
> $ spark-shell
> /c/.../spark/bin/spark-class: line 76: conditional binary operator expected



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4134) Dynamic allocation: tone down scary executor lost messages when killing on purpose

2015-08-20 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4134:
-
Target Version/s: 1.6.0  (was: 1.5.0)

> Dynamic allocation: tone down scary executor lost messages when killing on 
> purpose
> --
>
> Key: SPARK-4134
> URL: https://issues.apache.org/jira/browse/SPARK-4134
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> After SPARK-3822 goes in, we are now able to dynamically kill executors after 
> an application has started. However, when we do that we get a ton of scary 
> error messages telling us that we've done wrong somehow. It would be good to 
> detect when this is the case and prevent these messages from surfacing.
> This maybe difficult, however, because the connection manager tends to be 
> quite verbose in unconditionally logging disconnection messages. This is a 
> very nice-to-have for 1.2 but certainly not a blocker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-10143) Parquet changed the behavior of calculating splits

2015-08-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-10143:
-
Comment: was deleted

(was: For something quick, we can use the row group size set in hadoop conf to 
set the min split size.)

> Parquet changed the behavior of calculating splits
> --
>
> Key: SPARK-10143
> URL: https://issues.apache.org/jira/browse/SPARK-10143
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Critical
>
> When Parquet's task side metadata is enabled (by default it is enabled and it 
> needs to be enabled to deal with tables with many files), Parquet delegates 
> the work of calculating initial splits to FileInputFormat (see 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L301-L311).
>  If filesystem's block size is smaller than the row group size and users do 
> not set min split size, splits in the initial split list will have lots of 
> dummy splits and they contribute to empty tasks (because the starting point 
> and ending point of a split does not cover the starting point of a row 
> group). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10143) Parquet changed the behavior of calculating splits

2015-08-20 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705988#comment-14705988
 ] 

Yin Huai commented on SPARK-10143:
--

[~rdblue] Can you confirm the behavior change of Parquet? Looks like we are 
just asking FileInputFormat to give us the initial splits. I am thinking to use 
the current setting of parquet row group size as the fs min split size for the 
job. What do you think? Thanks :)

> Parquet changed the behavior of calculating splits
> --
>
> Key: SPARK-10143
> URL: https://issues.apache.org/jira/browse/SPARK-10143
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Critical
>
> When Parquet's task side metadata is enabled (by default it is enabled and it 
> needs to be enabled to deal with tables with many files), Parquet delegates 
> the work of calculating initial splits to FileInputFormat (see 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L301-L311).
>  If filesystem's block size is smaller than the row group size and users do 
> not set min split size, splits in the initial split list will have lots of 
> dummy splits and they contribute to empty tasks (because the starting point 
> and ending point of a split does not cover the starting point of a row 
> group). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10143) Parquet changed the behavior of calculating splits

2015-08-20 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705985#comment-14705985
 ] 

Yin Huai commented on SPARK-10143:
--

For something quick, we can use the row group size set in hadoop conf to set 
the min split size.

> Parquet changed the behavior of calculating splits
> --
>
> Key: SPARK-10143
> URL: https://issues.apache.org/jira/browse/SPARK-10143
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Critical
>
> When Parquet's task side metadata is enabled (by default it is enabled and it 
> needs to be enabled to deal with tables with many files), Parquet delegates 
> the work of calculating initial splits to FileInputFormat (see 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L301-L311).
>  If filesystem's block size is smaller than the row group size and users do 
> not set min split size, splits in the initial split list will have lots of 
> dummy splits and they contribute to empty tasks (because the starting point 
> and ending point of a split does not cover the starting point of a row 
> group). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10143) Parquet changed the behavior of calculating splits

2015-08-20 Thread Yin Huai (JIRA)
Yin Huai created SPARK-10143:


 Summary: Parquet changed the behavior of calculating splits
 Key: SPARK-10143
 URL: https://issues.apache.org/jira/browse/SPARK-10143
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: Yin Huai
Priority: Critical


When Parquet's task side metadata is enabled (by default it is enabled and it 
needs to be enabled to deal with tables with many files), Parquet delegates the 
work of calculating initial splits to FileInputFormat (see 
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L301-L311).
 If filesystem's block size is smaller than the row group size and users do not 
set min split size, splits in the initial split list will have lots of dummy 
splits and they contribute to empty tasks (because the starting point and 
ending point of a split does not cover the starting point of a row group). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10142) Python checkpoint recovery does not work with non-local file path

2015-08-20 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-10142:
-

 Summary: Python checkpoint recovery does not work with non-local 
file path
 Key: SPARK-10142
 URL: https://issues.apache.org/jira/browse/SPARK-10142
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Streaming
Affects Versions: 1.4.1, 1.3.1
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9811) Test updated Kinesis Receiver

2015-08-20 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-9811.
--
Resolution: Fixed

> Test updated Kinesis Receiver
> -
>
> Key: SPARK-9811
> URL: https://issues.apache.org/jira/browse/SPARK-9811
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-9811) Test updated Kinesis Receiver

2015-08-20 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das reopened SPARK-9811:
--

> Test updated Kinesis Receiver
> -
>
> Key: SPARK-9811
> URL: https://issues.apache.org/jira/browse/SPARK-9811
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9811) Test updated Kinesis Receiver

2015-08-20 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-9811.
--
Resolution: Done

> Test updated Kinesis Receiver
> -
>
> Key: SPARK-9811
> URL: https://issues.apache.org/jira/browse/SPARK-9811
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10141) Number of tasks on executors still become negative after failures

2015-08-20 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-10141:
--
Description: 
I hit this failure when running LDA on EC2 (after I made the model size really 
big).

I was using the LDAExample.scala code on an EC2 cluster with 16 workers 
(r3.2xlarge), on a Wikipedia dataset:
{code}
Training set size (documents)   4534059
Vocabulary size (terms) 1
Training set size (tokens)  895575317
EM optimizer
1K topics
{code}

Failure message:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in 
stage 22.0 failed 4 times, most recent failure: Lost task 55.3 in stage 22.0 
(TID 2881, 10.0.202.128): java.io.IOException: Failed to connect to 
/10.0.202.128:54740
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /10.0.202.128:54740
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1267)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1255)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1254)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1254)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:684)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1480)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1442)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1431)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:554)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1805)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1059)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.RDD.fold(RDD.scala:1053)
at 
org.apache.spark.mllib.clustering.EMLDAOptimizer.computeGlobalTopicTotals(LDAOptimizer.scala:205)
at 
org.apache.spark.mllib.clustering.EMLDAOptimizer.next(LDAOptimizer.sc

[jira] [Commented] (SPARK-10141) Number of tasks on executors still become negative after failures

2015-08-20 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705881#comment-14705881
 ] 

Joseph K. Bradley commented on SPARK-10141:
---

Note this bug was found with a 1.5 version which includes the fix from 
[SPARK-8560]

> Number of tasks on executors still become negative after failures
> -
>
> Key: SPARK-10141
> URL: https://issues.apache.org/jira/browse/SPARK-10141
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Joseph K. Bradley
>Priority: Minor
> Attachments: Screen Shot 2015-08-20 at 3.14.49 PM.png
>
>
> I hit this failure when running LDA on EC2 (after I made the model size 
> really big).
> Failure message:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in 
> stage 22.0 failed 4 times, most recent failure: Lost task 55.3 in stage 22.0 
> (TID 2881, 10.0.202.128): java.io.IOException: Failed to connect to 
> /10.0.202.128:54740
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
>   at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused: /10.0.202.128:54740
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   ... 1 more
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1267)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1255)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1254)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1254)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:684)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1480)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1442)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1431)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:554)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1805)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
>   at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1059)
>   at 
> org.apache.spark.rdd.RDDOperationScop

[jira] [Updated] (SPARK-10141) Number of tasks on executors still become negative after failures

2015-08-20 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-10141:
--
Attachment: Screen Shot 2015-08-20 at 3.14.49 PM.png

> Number of tasks on executors still become negative after failures
> -
>
> Key: SPARK-10141
> URL: https://issues.apache.org/jira/browse/SPARK-10141
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Joseph K. Bradley
>Priority: Minor
> Attachments: Screen Shot 2015-08-20 at 3.14.49 PM.png
>
>
> I hit this failure when running LDA on EC2 (after I made the model size 
> really big).
> Failure message:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in 
> stage 22.0 failed 4 times, most recent failure: Lost task 55.3 in stage 22.0 
> (TID 2881, 10.0.202.128): java.io.IOException: Failed to connect to 
> /10.0.202.128:54740
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
>   at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused: /10.0.202.128:54740
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   ... 1 more
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1267)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1255)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1254)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1254)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:684)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1480)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1442)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1431)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:554)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1805)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
>   at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1059)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at 
> org.apache.spark.rdd.RDDOperat

[jira] [Updated] (SPARK-10141) Number of tasks on executors still become negative after failures

2015-08-20 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-10141:
--
Affects Version/s: 1.5.0

> Number of tasks on executors still become negative after failures
> -
>
> Key: SPARK-10141
> URL: https://issues.apache.org/jira/browse/SPARK-10141
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Joseph K. Bradley
>Priority: Minor
> Attachments: Screen Shot 2015-08-20 at 3.14.49 PM.png
>
>
> I hit this failure when running LDA on EC2 (after I made the model size 
> really big).
> Failure message:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in 
> stage 22.0 failed 4 times, most recent failure: Lost task 55.3 in stage 22.0 
> (TID 2881, 10.0.202.128): java.io.IOException: Failed to connect to 
> /10.0.202.128:54740
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
>   at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused: /10.0.202.128:54740
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   ... 1 more
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1267)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1255)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1254)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1254)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:684)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1480)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1442)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1431)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:554)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1805)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
>   at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1059)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOpera

[jira] [Created] (SPARK-10141) Number of tasks on executors still become negative after failures

2015-08-20 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-10141:
-

 Summary: Number of tasks on executors still become negative after 
failures
 Key: SPARK-10141
 URL: https://issues.apache.org/jira/browse/SPARK-10141
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Joseph K. Bradley
Priority: Minor


I hit this failure when running LDA on EC2 (after I made the model size really 
big).

Failure message:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in 
stage 22.0 failed 4 times, most recent failure: Lost task 55.3 in stage 22.0 
(TID 2881, 10.0.202.128): java.io.IOException: Failed to connect to 
/10.0.202.128:54740
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /10.0.202.128:54740
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1267)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1255)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1254)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1254)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:684)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:684)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1480)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1442)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1431)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:554)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1805)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1059)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.RDD.fold(RDD.scala:1053)
at 
org.apache.spark.mllib.clustering.EMLDAOptimizer.computeGlobalTopicTotals(LDAOptimizer.scala:205)
at 
org.apache.spark.mllib.clustering.EMLDAOptimizer.next(LDAOptimizer.scala:192)
at 
org.apache.spark.mllib.clustering.E

[jira] [Resolved] (SPARK-9400) Implement code generation for StringLocate

2015-08-20 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-9400.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 8330
[https://github.com/apache/spark/pull/8330]

> Implement code generation for StringLocate
> --
>
> Key: SPARK-9400
> URL: https://issues.apache.org/jira/browse/SPARK-9400
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9245) DistributedLDAModel predict top topic per doc-term instance

2015-08-20 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9245:
-
Target Version/s:   (was: 1.6.0)

> DistributedLDAModel predict top topic per doc-term instance
> ---
>
> Key: SPARK-9245
> URL: https://issues.apache.org/jira/browse/SPARK-9245
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 1.5.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> For each (document, term) pair, return top topic.  Note that instances of 
> (doc, term) pairs within a document (a.k.a. "tokens") are exchangeable, so we 
> should provide an estimate per document-term, rather than per token.
> Synopsis for DistributedLDAModel:
> {code}
> /** @return RDD of (doc ID, vector of top topic index for each term) */
> def topTopicAssignments: RDD[(Long, Vector)]
> {code}
> Note that using Vector will let us have a sparse encoding which is 
> Java-friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4223) Support * (meaning all users) as part of the acls

2015-08-20 Thread Zhuo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705851#comment-14705851
 ] 

Zhuo Liu commented on SPARK-4223:
-

Hi, everyone, I am working on this and will submit a pull request for this soon.

> Support * (meaning all users) as part of the acls
> -
>
> Key: SPARK-4223
> URL: https://issues.apache.org/jira/browse/SPARK-4223
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> Currently we support setting view and modify acls but you have to specify a 
> list of users.  It would be nice to support * meaning all users have access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9245) DistributedLDAModel predict top topic per doc-term instance

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-9245.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8329
[https://github.com/apache/spark/pull/8329]

> DistributedLDAModel predict top topic per doc-term instance
> ---
>
> Key: SPARK-9245
> URL: https://issues.apache.org/jira/browse/SPARK-9245
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 1.5.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> For each (document, term) pair, return top topic.  Note that instances of 
> (doc, term) pairs within a document (a.k.a. "tokens") are exchangeable, so we 
> should provide an estimate per document-term, rather than per token.
> Synopsis for DistributedLDAModel:
> {code}
> /** @return RDD of (doc ID, vector of top topic index for each term) */
> def topTopicAssignments: RDD[(Long, Vector)]
> {code}
> Note that using Vector will let us have a sparse encoding which is 
> Java-friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10108) Add @Since annotation to mllib.feature

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-10108.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8309
[https://github.com/apache/spark/pull/8309]

> Add @Since annotation to mllib.feature
> --
>
> Key: SPARK-10108
> URL: https://issues.apache.org/jira/browse/SPARK-10108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib
>Reporter: Manoj Kumar
>Assignee: Manoj Kumar
>Priority: Minor
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10138) Setters do not return self type in Java MultilayerPerceptronClassifier

2015-08-20 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-10138.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8342
[https://github.com/apache/spark/pull/8342]

> Setters do not return self type in Java MultilayerPerceptronClassifier
> --
>
> Key: SPARK-10138
> URL: https://issues.apache.org/jira/browse/SPARK-10138
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Blocker
> Fix For: 1.5.0
>
>
> We need to move setters to the final class instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10140) Add target fields to @Since annotation

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10140:


Assignee: Xiangrui Meng  (was: Apache Spark)

> Add target fields to @Since annotation
> --
>
> Key: SPARK-10140
> URL: https://issues.apache.org/jira/browse/SPARK-10140
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> Add target fields to @Since so constructor params and fields also get 
> annotated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10140) Add target fields to @Since annotation

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705819#comment-14705819
 ] 

Apache Spark commented on SPARK-10140:
--

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/8344

> Add target fields to @Since annotation
> --
>
> Key: SPARK-10140
> URL: https://issues.apache.org/jira/browse/SPARK-10140
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> Add target fields to @Since so constructor params and fields also get 
> annotated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10140) Add target fields to @Since annotation

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10140:


Assignee: Apache Spark  (was: Xiangrui Meng)

> Add target fields to @Since annotation
> --
>
> Key: SPARK-10140
> URL: https://issues.apache.org/jira/browse/SPARK-10140
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Assignee: Apache Spark
>
> Add target fields to @Since so constructor params and fields also get 
> annotated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10140) Add target fields to @Since annotation

2015-08-20 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-10140:
-

 Summary: Add target fields to @Since annotation
 Key: SPARK-10140
 URL: https://issues.apache.org/jira/browse/SPARK-10140
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng


Add target fields to @Since so constructor params and fields also get annotated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5556) Latent Dirichlet Allocation (LDA) using Gibbs sampler

2015-08-20 Thread Jason Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705810#comment-14705810
 ] 

Jason Dai commented on SPARK-5556:
--

[~pedrorodriguez] We'll try to make a spark package based on our repo; please 
help take a look at the code and provide your feedback. Please let us know if 
there are anything we may collaborate for LDA/topic modeling on Spark.

> Latent Dirichlet Allocation (LDA) using Gibbs sampler 
> --
>
> Key: SPARK-5556
> URL: https://issues.apache.org/jira/browse/SPARK-5556
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Pedro Rodriguez
> Attachments: LDA_test.xlsx, spark-summit.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10139) java.util.NoSuchElementException

2015-08-20 Thread dan young (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dan young updated SPARK-10139:
--
Description: 
cloned Spark repo and am running the spark-ec2/spark_ec2.py on branch-1.4.
Startup a Spark 1.4.1 cluster, and then try to run/connect to the ThriftServer2.

When I try to connect via Beeline, or via JDBC thru the ThriftServer2, I'm 
getting an java.util.NoSuchElementException with any command.  Example: !sql 
show tables

Also, If I just connect via SparkSQL, i.e ./spark/bin/spark-sql   I'm able 
to run show tables, queries,etc


Here is the ThriftServer Log:
https://www.refheap.com/c043705ce7978c16c9cda4e12

I run the same code/examples with Spark 1.3.1 and I'm able to connect via 
BeeLine and/or JDBC thru the ThriftServer2, and run queries, etc


Someone else seems to have the same issue:

http://stackoverflow.com/questions/31984057/unable-to-see-hive-tables-from-beeline-in-spark-version-1-4-0



  was:
cloned Spark repo and am running the spark-ec2/spark_ec2.py on branch-1.4.
Startup a Spark 1.4.1 cluster, and then try to run/connect to the ThriftServer2.

When I try to connect via Beeline, or via JDBC thru the ThriftServer2, I'm 
getting an java.util.NoSuchElementException with any command.  Example: !sql 
show tables

Here is the ThriftServer Log:

https://www.refheap.com/c043705ce7978c16c9cda4e12

I run the same code/examples with Spark 1.3.1 and I'm able to connect via 
BeeLine and/or JDBC thru the ThriftServer2.  Also, If I just connect via 
SparkSQL, i.e ./spark/bin/spark-sql   I'm able to run show tables, 
queries,etc


Someone else seems to have the same issue:

http://stackoverflow.com/questions/31984057/unable-to-see-hive-tables-from-beeline-in-spark-version-1-4-0




> java.util.NoSuchElementException
> 
>
> Key: SPARK-10139
> URL: https://issues.apache.org/jira/browse/SPARK-10139
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1
> Environment: using the spark-ec2/spark_ec2.py scripts to launch spark 
> cluster in AWS.
>Reporter: dan young
>
> cloned Spark repo and am running the spark-ec2/spark_ec2.py on branch-1.4.
> Startup a Spark 1.4.1 cluster, and then try to run/connect to the 
> ThriftServer2.
> When I try to connect via Beeline, or via JDBC thru the ThriftServer2, I'm 
> getting an java.util.NoSuchElementException with any command.  Example: !sql 
> show tables
> Also, If I just connect via SparkSQL, i.e ./spark/bin/spark-sql   I'm 
> able to run show tables, queries,etc
> Here is the ThriftServer Log:
> https://www.refheap.com/c043705ce7978c16c9cda4e12
> I run the same code/examples with Spark 1.3.1 and I'm able to connect via 
> BeeLine and/or JDBC thru the ThriftServer2, and run queries, etc
> Someone else seems to have the same issue:
> http://stackoverflow.com/questions/31984057/unable-to-see-hive-tables-from-beeline-in-spark-version-1-4-0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10139) java.util.NoSuchElementException

2015-08-20 Thread dan young (JIRA)
dan young created SPARK-10139:
-

 Summary: java.util.NoSuchElementException
 Key: SPARK-10139
 URL: https://issues.apache.org/jira/browse/SPARK-10139
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.1
 Environment: using the spark-ec2/spark_ec2.py scripts to launch spark 
cluster in AWS.
Reporter: dan young


cloned Spark repo and am running the spark-ec2/spark_ec2.py on branch-1.4.
Startup a Spark 1.4.1 cluster, and then try to run/connect to the ThriftServer2.

When I try to connect via Beeline, or via JDBC thru the ThriftServer2, I'm 
getting an java.util.NoSuchElementException with any command.  Example: !sql 
show tables

Here is the ThriftServer Log:

https://www.refheap.com/c043705ce7978c16c9cda4e12

I run the same code/examples with Spark 1.3.1 and I'm able to connect via 
BeeLine and/or JDBC thru the ThriftServer2.  Also, If I just connect via 
SparkSQL, i.e ./spark/bin/spark-sql   I'm able to run show tables, 
queries,etc


Someone else seems to have the same issue:

http://stackoverflow.com/questions/31984057/unable-to-see-hive-tables-from-beeline-in-spark-version-1-4-0





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8952) JsonFile() of SQLContext display improper warning message for a S3 path

2015-08-20 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8952:
---

Assignee: (was: Apache Spark)

> JsonFile() of SQLContext display improper warning message for a S3 path
> ---
>
> Key: SPARK-8952
> URL: https://issues.apache.org/jira/browse/SPARK-8952
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.4.0
>Reporter: Sun Rui
>
> This is an issue reported by Ben Spark .
> {quote}
> Spark 1.4 deployed on AWS EMR 
> "jsonFile" is working though with some warning message
> Warning message:
> In normalizePath(path) :
>   
> path[1]="s3://rea-consumer-data-dev/cbr/profiler/output/20150618/part-0": 
> No such file or directory
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8952) JsonFile() of SQLContext display improper warning message for a S3 path

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705616#comment-14705616
 ] 

Apache Spark commented on SPARK-8952:
-

User 'lresende' has created a pull request for this issue:
https://github.com/apache/spark/pull/8343

> JsonFile() of SQLContext display improper warning message for a S3 path
> ---
>
> Key: SPARK-8952
> URL: https://issues.apache.org/jira/browse/SPARK-8952
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.4.0
>Reporter: Sun Rui
>
> This is an issue reported by Ben Spark .
> {quote}
> Spark 1.4 deployed on AWS EMR 
> "jsonFile" is working though with some warning message
> Warning message:
> In normalizePath(path) :
>   
> path[1]="s3://rea-consumer-data-dev/cbr/profiler/output/20150618/part-0": 
> No such file or directory
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >