date:20150408

[jira] [Assigned] (SPARK-6765) Turn scalastyle on for test code

2015-04-08 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin reassigned SPARK-6765:
--

Assignee: Reynold Xin

> Turn scalastyle on for test code
> 
>
> Key: SPARK-6765
> URL: https://issues.apache.org/jira/browse/SPARK-6765
> Project: Spark
>  Issue Type: Improvement
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We should turn scalastyle on for test code. Test code should be as important 
> as main code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6762) Fix potential resource leaks in CheckPoint CheckpointWriter and CheckpointReader

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6762:
---

Assignee: Apache Spark

> Fix potential resource leaks in CheckPoint CheckpointWriter and 
> CheckpointReader
> 
>
> Key: SPARK-6762
> URL: https://issues.apache.org/jira/browse/SPARK-6762
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: zhichao-li
>Assignee: Apache Spark
>Priority: Minor
>
> The close action should be placed within finally block to avoid the potential 
> resource leaks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6762) Fix potential resource leaks in CheckPoint CheckpointWriter and CheckpointReader

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6762:
---

Assignee: (was: Apache Spark)

> Fix potential resource leaks in CheckPoint CheckpointWriter and 
> CheckpointReader
> 
>
> Key: SPARK-6762
> URL: https://issues.apache.org/jira/browse/SPARK-6762
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: zhichao-li
>Priority: Minor
>
> The close action should be placed within finally block to avoid the potential 
> resource leaks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6762) Fix potential resource leaks in CheckPoint CheckpointWriter and CheckpointReader

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484875#comment-14484875
 ] 

Apache Spark commented on SPARK-6762:
-

User 'nareshpr' has created a pull request for this issue:
https://github.com/apache/spark/pull/5413

> Fix potential resource leaks in CheckPoint CheckpointWriter and 
> CheckpointReader
> 
>
> Key: SPARK-6762
> URL: https://issues.apache.org/jira/browse/SPARK-6762
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: zhichao-li
>Priority: Minor
>
> The close action should be placed within finally block to avoid the potential 
> resource leaks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6766) StreamingListenerBatchSubmitted isn't sent and StreamingListenerBatchStarted.batchInfo.processingStartTime is a wrong value

2015-04-08 Thread Shixiong Zhu (JIRA)

Shixiong Zhu created SPARK-6766:
---

 Summary: StreamingListenerBatchSubmitted isn't sent and 
StreamingListenerBatchStarted.batchInfo.processingStartTime is a wrong value
 Key: SPARK-6766
 URL: https://issues.apache.org/jira/browse/SPARK-6766
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.3.0, 1.2.1, 1.1.1, 1.0.2
Reporter: Shixiong Zhu


1. Now there is no place posting StreamingListenerBatchSubmitted. It should be 
post when JobSet is submitted.
2. Call JobSet.handleJobStart before posting StreamingListenerBatchStarted will 
set StreamingListenerBatchStarted.batchInfo.processingStartTime to None, which 
should have been set to a correct value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6766) StreamingListenerBatchSubmitted isn't sent and StreamingListenerBatchStarted.batchInfo.processingStartTime is a wrong value

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484898#comment-14484898
 ] 

Apache Spark commented on SPARK-6766:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/5414

> StreamingListenerBatchSubmitted isn't sent and 
> StreamingListenerBatchStarted.batchInfo.processingStartTime is a wrong value
> ---
>
> Key: SPARK-6766
> URL: https://issues.apache.org/jira/browse/SPARK-6766
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Shixiong Zhu
>
> 1. Now there is no place posting StreamingListenerBatchSubmitted. It should 
> be post when JobSet is submitted.
> 2. Call JobSet.handleJobStart before posting StreamingListenerBatchStarted 
> will set StreamingListenerBatchStarted.batchInfo.processingStartTime to None, 
> which should have been set to a correct value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6766) StreamingListenerBatchSubmitted isn't sent and StreamingListenerBatchStarted.batchInfo.processingStartTime is a wrong value

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6766:
---

Assignee: (was: Apache Spark)

> StreamingListenerBatchSubmitted isn't sent and 
> StreamingListenerBatchStarted.batchInfo.processingStartTime is a wrong value
> ---
>
> Key: SPARK-6766
> URL: https://issues.apache.org/jira/browse/SPARK-6766
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Shixiong Zhu
>
> 1. Now there is no place posting StreamingListenerBatchSubmitted. It should 
> be post when JobSet is submitted.
> 2. Call JobSet.handleJobStart before posting StreamingListenerBatchStarted 
> will set StreamingListenerBatchStarted.batchInfo.processingStartTime to None, 
> which should have been set to a correct value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6766) StreamingListenerBatchSubmitted isn't sent and StreamingListenerBatchStarted.batchInfo.processingStartTime is a wrong value

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6766:
---

Assignee: Apache Spark

> StreamingListenerBatchSubmitted isn't sent and 
> StreamingListenerBatchStarted.batchInfo.processingStartTime is a wrong value
> ---
>
> Key: SPARK-6766
> URL: https://issues.apache.org/jira/browse/SPARK-6766
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>
> 1. Now there is no place posting StreamingListenerBatchSubmitted. It should 
> be post when JobSet is submitted.
> 2. Call JobSet.handleJobStart before posting StreamingListenerBatchStarted 
> will set StreamingListenerBatchStarted.batchInfo.processingStartTime to None, 
> which should have been set to a correct value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6762) Fix potential resource leaks in CheckPoint CheckpointWriter and CheckpointReader

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484900#comment-14484900
 ] 

Apache Spark commented on SPARK-6762:
-

User 'zhichao-li' has created a pull request for this issue:
https://github.com/apache/spark/pull/5407

> Fix potential resource leaks in CheckPoint CheckpointWriter and 
> CheckpointReader
> 
>
> Key: SPARK-6762
> URL: https://issues.apache.org/jira/browse/SPARK-6762
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: zhichao-li
>Priority: Minor
>
> The close action should be placed within finally block to avoid the potential 
> resource leaks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6767) Documentation error in Spark SQL Readme file

2015-04-08 Thread Tijo Thomas (JIRA)

Tijo Thomas created SPARK-6767:
--

 Summary: Documentation error in Spark SQL Readme file
 Key: SPARK-6767
 URL: https://issues.apache.org/jira/browse/SPARK-6767
 Project: Spark
  Issue Type: Bug
  Components: Documentation, SQL
Affects Versions: 1.3.0
Reporter: Tijo Thomas
Priority: Trivial


Error in Spark SQL Documentation file . The sample script for SQL DSL   
throwing below error

scala> query.where('key > 30).select(avg('key)).collect()
:43: error: value > is not a member of Symbol
  query.where('key > 30).select(avg('key)).collect()




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6767) Documentation error in Spark SQL Readme file

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6767:
---

Assignee: Apache Spark

> Documentation error in Spark SQL Readme file
> 
>
> Key: SPARK-6767
> URL: https://issues.apache.org/jira/browse/SPARK-6767
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Affects Versions: 1.3.0
>Reporter: Tijo Thomas
>Assignee: Apache Spark
>Priority: Trivial
>
> Error in Spark SQL Documentation file . The sample script for SQL DSL   
> throwing below error
> scala> query.where('key > 30).select(avg('key)).collect()
> :43: error: value > is not a member of Symbol
>   query.where('key > 30).select(avg('key)).collect()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6767) Documentation error in Spark SQL Readme file

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6767:
---

Assignee: (was: Apache Spark)

> Documentation error in Spark SQL Readme file
> 
>
> Key: SPARK-6767
> URL: https://issues.apache.org/jira/browse/SPARK-6767
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Affects Versions: 1.3.0
>Reporter: Tijo Thomas
>Priority: Trivial
>
> Error in Spark SQL Documentation file . The sample script for SQL DSL   
> throwing below error
> scala> query.where('key > 30).select(avg('key)).collect()
> :43: error: value > is not a member of Symbol
>   query.where('key > 30).select(avg('key)).collect()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6767) Documentation error in Spark SQL Readme file

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484946#comment-14484946
 ] 

Apache Spark commented on SPARK-6767:
-

User 'tijoparacka' has created a pull request for this issue:
https://github.com/apache/spark/pull/5415

> Documentation error in Spark SQL Readme file
> 
>
> Key: SPARK-6767
> URL: https://issues.apache.org/jira/browse/SPARK-6767
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Affects Versions: 1.3.0
>Reporter: Tijo Thomas
>Priority: Trivial
>
> Error in Spark SQL Documentation file . The sample script for SQL DSL   
> throwing below error
> scala> query.where('key > 30).select(avg('key)).collect()
> :43: error: value > is not a member of Symbol
>   query.where('key > 30).select(avg('key)).collect()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6768) Do not support "float/double union decimal and decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Zhongshuai Pei (JIRA)

Zhongshuai Pei created SPARK-6768:
-

 Summary: Do not support "float/double union decimal and decimal(a 
,b) union decimal(c, d)"
 Key: SPARK-6768
 URL: https://issues.apache.org/jira/browse/SPARK-6768
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Zhongshuai Pei






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6769) Usage of the ListenerBus in YarnClusterSuite is wrong

2015-04-08 Thread Kousuke Saruta (JIRA)

Kousuke Saruta created SPARK-6769:
-

 Summary: Usage of the ListenerBus in YarnClusterSuite is wrong
 Key: SPARK-6769
 URL: https://issues.apache.org/jira/browse/SPARK-6769
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests, YARN
Affects Versions: 1.4.0
Reporter: Kousuke Saruta
Priority: Minor


In YarnClusterSuite, a test case uses `SaveExecutorInfo`  to handle 
ExecutorAddedEvent as follows.

{code}
private class SaveExecutorInfo extends SparkListener {
  val addedExecutorInfos = mutable.Map[String, ExecutorInfo]()

  override def onExecutorAdded(executor: SparkListenerExecutorAdded) {
addedExecutorInfos(executor.executorId) = executor.executorInfo
  }
}

...

listener = new SaveExecutorInfo
val sc = new SparkContext(new SparkConf()
  .setAppName("yarn \"test app\" 'with quotes' and \\back\\slashes and 
$dollarSigns"))
sc.addSparkListener(listener)
val status = new File(args(0))
var result = "failure"
try {
  val data = sc.parallelize(1 to 4, 4).collect().toSet
  assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
  data should be (Set(1, 2, 3, 4))
  result = "success"
} finally {
  sc.stop()
  Files.write(result, status, UTF_8)
}
{code}

But, the usage is wrong because Executors will spawn during initializing 
SparkContext and SparkContext#addSparkListener should be invoked after the 
initialization, thus after Executors spawn, so SaveExecutorInfo cannot handle 
ExecutorAddedEvent.

Following code refers the result of the handling ExecutorAddedEvent. Because of 
the reason above, we cannot reach the assertion. 

{code}
// verify log urls are present
listener.addedExecutorInfos.values.foreach { info =>
  assert(info.logUrlMap.nonEmpty)
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6768) Do not support "float/double union decimal and decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei updated SPARK-6768:
--
Description: 
Do not support sql like that :
```
select cast(12.2056999 as float) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(12.2056999 as double) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(1241.20 as decimal(6, 2)) from testData limit 1
union
select cast(1.204 as decimal(5, 3)) from testData limit 1
```


> Do not support "float/double union decimal and decimal(a ,b) union decimal(c, 
> d)"
> -
>
> Key: SPARK-6768
> URL: https://issues.apache.org/jira/browse/SPARK-6768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhongshuai Pei
>
> Do not support sql like that :
> ```
> select cast(12.2056999 as float) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ```
> ```
> select cast(12.2056999 as double) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ```
> ```
> select cast(1241.20 as decimal(6, 2)) from testData limit 1
> union
> select cast(1.204 as decimal(5, 3)) from testData limit 1
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1499) Workers continuously produce failing executors

2015-04-08 Thread XeonZhao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484962#comment-14484962
 ] 

XeonZhao commented on SPARK-1499:
-

How to solve this problem ？I have encountered this issue.

> Workers continuously produce failing executors
> --
>
> Key: SPARK-1499
> URL: https://issues.apache.org/jira/browse/SPARK-1499
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 0.9.1, 1.0.0
>Reporter: Aaron Davidson
>
> If a node is in a bad state, such that newly started executors fail on 
> startup or first use, the Standalone Cluster Worker will happily keep 
> spawning new ones. A better behavior would be for a Worker to mark itself as 
> dead if it has had a history of continuously producing erroneous executors, 
> or else to somehow prevent a driver from re-registering executors from the 
> same machine repeatedly.
> Reported on mailing list: 
> http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccal8t0bqjfgtf-vbzjq6yj7ckbl_9p9s0trvew2mvg6zbngx...@mail.gmail.com%3E
> Relevant logs: 
> {noformat}
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/4 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/4 removed: Command exited with code 53
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 4 
> disconnected, so removing it
> 14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4 
> (already removed): Failed to create local directory (bad spark.local.dir?)
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/27 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/27 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now RUNNING
> 14/04/11 19:06:52 INFO storage.BlockManagerMasterActor$BlockManagerInfo: 
> Registering block manager ip-172-31-24-76.us-west-1.compute.internal:50256 
> with 32.7 GB RAM
> 14/04/11 19:06:52 INFO metastore.HiveMetaStore: 0: get_table : db=default 
> tbl=wikistats_pd
> 14/04/11 19:06:52 INFO HiveMetaStore.audit: ugi=root  ip=unknown-ip-addr  
> cmd=get_table : db=default tbl=wikistats_pd 
> 14/04/11 19:06:53 DEBUG hive.log: DDL: struct wikistats_pd { string 
> projectcode, string pagename, i32 pageviews, i32 bytes}
> 14/04/11 19:06:53 DEBUG lazy.LazySimpleSerDe: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: 
> columnNames=[projectcode, pagename, pageviews, bytes] columnTypes=[string, 
> string, int, int] separator=[[B@29a81175] nullstring=\N 
> lastColumnTakesRest=false
> shark> 14/04/11 19:06:55 INFO cluster.SparkDeploySchedulerBackend: Registered 
> executor: 
> Actor[akka.tcp://sparkexecu...@ip-172-31-19-11.us-west-1.compute.internal:45248/user/Executor#-1002203295]
>  with ID 27
> show 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 27 
> disconnected, so removing it
> 14/04/11 19:06:56 ERROR scheduler.TaskSchedulerImpl: Lost an executor 27 
> (already removed): remote Akka client disassociated
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/27 removed: Command exited with code 53
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/28 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/28 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/28 is now RUNNING
> tables;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6768) Do not support "float/double union decimal and decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei updated SPARK-6768:
--
Description: 
Do not support sql like that :
...
select cast(12.2056999 as float) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
...
```
select cast(12.2056999 as double) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(1241.20 as decimal(6, 2)) from testData limit 1
union
select cast(1.204 as decimal(5, 3)) from testData limit 1
```


  was:
Do not support sql like that :
```
select cast(12.2056999 as float) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(12.2056999 as double) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(1241.20 as decimal(6, 2)) from testData limit 1
union
select cast(1.204 as decimal(5, 3)) from testData limit 1
```



> Do not support "float/double union decimal and decimal(a ,b) union decimal(c, 
> d)"
> -
>
> Key: SPARK-6768
> URL: https://issues.apache.org/jira/browse/SPARK-6768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhongshuai Pei
>
> Do not support sql like that :
> ...
> select cast(12.2056999 as float) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ...
> ```
> select cast(12.2056999 as double) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ```
> ```
> select cast(1241.20 as decimal(6, 2)) from testData limit 1
> union
> select cast(1.204 as decimal(5, 3)) from testData limit 1
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6768) Do not support "float/double union decimal and decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei updated SPARK-6768:
--
Description: 
Do not support sql like that :
...
select cast(12.2056999 as float) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
...
```
select cast(12.2056999 as double) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(1241.20 as decimal(6, 2)) from testData limit 1
union
select cast(1.204 as decimal(5, 3)) from testData limit 1
```


  was:
Do not support sql like that :
```
select cast(12.2056999 as float) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(12.2056999 as double) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(1241.20 as decimal(6, 2)) from testData limit 1
union
select cast(1.204 as decimal(5, 3)) from testData limit 1
```



> Do not support "float/double union decimal and decimal(a ,b) union decimal(c, 
> d)"
> -
>
> Key: SPARK-6768
> URL: https://issues.apache.org/jira/browse/SPARK-6768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhongshuai Pei
>
> Do not support sql like that :
> ...
> select cast(12.2056999 as float) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ...
> ```
> select cast(12.2056999 as double) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ```
> ```
> select cast(1241.20 as decimal(6, 2)) from testData limit 1
> union
> select cast(1.204 as decimal(5, 3)) from testData limit 1
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1499) Workers continuously produce failing executors

2015-04-08 Thread XeonZhao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484964#comment-14484964
 ] 

XeonZhao commented on SPARK-1499:
-

How to solve this problem ？I have encountered this issue.

> Workers continuously produce failing executors
> --
>
> Key: SPARK-1499
> URL: https://issues.apache.org/jira/browse/SPARK-1499
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 0.9.1, 1.0.0
>Reporter: Aaron Davidson
>
> If a node is in a bad state, such that newly started executors fail on 
> startup or first use, the Standalone Cluster Worker will happily keep 
> spawning new ones. A better behavior would be for a Worker to mark itself as 
> dead if it has had a history of continuously producing erroneous executors, 
> or else to somehow prevent a driver from re-registering executors from the 
> same machine repeatedly.
> Reported on mailing list: 
> http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccal8t0bqjfgtf-vbzjq6yj7ckbl_9p9s0trvew2mvg6zbngx...@mail.gmail.com%3E
> Relevant logs: 
> {noformat}
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/4 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/4 removed: Command exited with code 53
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 4 
> disconnected, so removing it
> 14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4 
> (already removed): Failed to create local directory (bad spark.local.dir?)
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/27 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/27 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now RUNNING
> 14/04/11 19:06:52 INFO storage.BlockManagerMasterActor$BlockManagerInfo: 
> Registering block manager ip-172-31-24-76.us-west-1.compute.internal:50256 
> with 32.7 GB RAM
> 14/04/11 19:06:52 INFO metastore.HiveMetaStore: 0: get_table : db=default 
> tbl=wikistats_pd
> 14/04/11 19:06:52 INFO HiveMetaStore.audit: ugi=root  ip=unknown-ip-addr  
> cmd=get_table : db=default tbl=wikistats_pd 
> 14/04/11 19:06:53 DEBUG hive.log: DDL: struct wikistats_pd { string 
> projectcode, string pagename, i32 pageviews, i32 bytes}
> 14/04/11 19:06:53 DEBUG lazy.LazySimpleSerDe: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: 
> columnNames=[projectcode, pagename, pageviews, bytes] columnTypes=[string, 
> string, int, int] separator=[[B@29a81175] nullstring=\N 
> lastColumnTakesRest=false
> shark> 14/04/11 19:06:55 INFO cluster.SparkDeploySchedulerBackend: Registered 
> executor: 
> Actor[akka.tcp://sparkexecu...@ip-172-31-19-11.us-west-1.compute.internal:45248/user/Executor#-1002203295]
>  with ID 27
> show 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 27 
> disconnected, so removing it
> 14/04/11 19:06:56 ERROR scheduler.TaskSchedulerImpl: Lost an executor 27 
> (already removed): remote Akka client disassociated
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/27 removed: Command exited with code 53
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/28 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/28 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/28 is now RUNNING
> tables;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6769) Usage of the ListenerBus in YarnClusterSuite is wrong

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484966#comment-14484966
 ] 

Apache Spark commented on SPARK-6769:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/5417

> Usage of the ListenerBus in YarnClusterSuite is wrong
> -
>
> Key: SPARK-6769
> URL: https://issues.apache.org/jira/browse/SPARK-6769
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests, YARN
>Affects Versions: 1.4.0
>Reporter: Kousuke Saruta
>Priority: Minor
>
> In YarnClusterSuite, a test case uses `SaveExecutorInfo`  to handle 
> ExecutorAddedEvent as follows.
> {code}
> private class SaveExecutorInfo extends SparkListener {
>   val addedExecutorInfos = mutable.Map[String, ExecutorInfo]()
>   override def onExecutorAdded(executor: SparkListenerExecutorAdded) {
> addedExecutorInfos(executor.executorId) = executor.executorInfo
>   }
> }
> ...
> listener = new SaveExecutorInfo
> val sc = new SparkContext(new SparkConf()
>   .setAppName("yarn \"test app\" 'with quotes' and \\back\\slashes and 
> $dollarSigns"))
> sc.addSparkListener(listener)
> val status = new File(args(0))
> var result = "failure"
> try {
>   val data = sc.parallelize(1 to 4, 4).collect().toSet
>   assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
>   data should be (Set(1, 2, 3, 4))
>   result = "success"
> } finally {
>   sc.stop()
>   Files.write(result, status, UTF_8)
> }
> {code}
> But, the usage is wrong because Executors will spawn during initializing 
> SparkContext and SparkContext#addSparkListener should be invoked after the 
> initialization, thus after Executors spawn, so SaveExecutorInfo cannot handle 
> ExecutorAddedEvent.
> Following code refers the result of the handling ExecutorAddedEvent. Because 
> of the reason above, we cannot reach the assertion. 
> {code}
> // verify log urls are present
> listener.addedExecutorInfos.values.foreach { info =>
>   assert(info.logUrlMap.nonEmpty)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6769) Usage of the ListenerBus in YarnClusterSuite is wrong

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6769:
---

Assignee: (was: Apache Spark)

> Usage of the ListenerBus in YarnClusterSuite is wrong
> -
>
> Key: SPARK-6769
> URL: https://issues.apache.org/jira/browse/SPARK-6769
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests, YARN
>Affects Versions: 1.4.0
>Reporter: Kousuke Saruta
>Priority: Minor
>
> In YarnClusterSuite, a test case uses `SaveExecutorInfo`  to handle 
> ExecutorAddedEvent as follows.
> {code}
> private class SaveExecutorInfo extends SparkListener {
>   val addedExecutorInfos = mutable.Map[String, ExecutorInfo]()
>   override def onExecutorAdded(executor: SparkListenerExecutorAdded) {
> addedExecutorInfos(executor.executorId) = executor.executorInfo
>   }
> }
> ...
> listener = new SaveExecutorInfo
> val sc = new SparkContext(new SparkConf()
>   .setAppName("yarn \"test app\" 'with quotes' and \\back\\slashes and 
> $dollarSigns"))
> sc.addSparkListener(listener)
> val status = new File(args(0))
> var result = "failure"
> try {
>   val data = sc.parallelize(1 to 4, 4).collect().toSet
>   assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
>   data should be (Set(1, 2, 3, 4))
>   result = "success"
> } finally {
>   sc.stop()
>   Files.write(result, status, UTF_8)
> }
> {code}
> But, the usage is wrong because Executors will spawn during initializing 
> SparkContext and SparkContext#addSparkListener should be invoked after the 
> initialization, thus after Executors spawn, so SaveExecutorInfo cannot handle 
> ExecutorAddedEvent.
> Following code refers the result of the handling ExecutorAddedEvent. Because 
> of the reason above, we cannot reach the assertion. 
> {code}
> // verify log urls are present
> listener.addedExecutorInfos.values.foreach { info =>
>   assert(info.logUrlMap.nonEmpty)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1499) Workers continuously produce failing executors

2015-04-08 Thread XeonZhao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484963#comment-14484963
 ] 

XeonZhao commented on SPARK-1499:
-

How to solve this problem ？I have encountered this issue.

> Workers continuously produce failing executors
> --
>
> Key: SPARK-1499
> URL: https://issues.apache.org/jira/browse/SPARK-1499
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 0.9.1, 1.0.0
>Reporter: Aaron Davidson
>
> If a node is in a bad state, such that newly started executors fail on 
> startup or first use, the Standalone Cluster Worker will happily keep 
> spawning new ones. A better behavior would be for a Worker to mark itself as 
> dead if it has had a history of continuously producing erroneous executors, 
> or else to somehow prevent a driver from re-registering executors from the 
> same machine repeatedly.
> Reported on mailing list: 
> http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccal8t0bqjfgtf-vbzjq6yj7ckbl_9p9s0trvew2mvg6zbngx...@mail.gmail.com%3E
> Relevant logs: 
> {noformat}
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/4 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/4 removed: Command exited with code 53
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 4 
> disconnected, so removing it
> 14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4 
> (already removed): Failed to create local directory (bad spark.local.dir?)
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/27 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/27 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now RUNNING
> 14/04/11 19:06:52 INFO storage.BlockManagerMasterActor$BlockManagerInfo: 
> Registering block manager ip-172-31-24-76.us-west-1.compute.internal:50256 
> with 32.7 GB RAM
> 14/04/11 19:06:52 INFO metastore.HiveMetaStore: 0: get_table : db=default 
> tbl=wikistats_pd
> 14/04/11 19:06:52 INFO HiveMetaStore.audit: ugi=root  ip=unknown-ip-addr  
> cmd=get_table : db=default tbl=wikistats_pd 
> 14/04/11 19:06:53 DEBUG hive.log: DDL: struct wikistats_pd { string 
> projectcode, string pagename, i32 pageviews, i32 bytes}
> 14/04/11 19:06:53 DEBUG lazy.LazySimpleSerDe: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: 
> columnNames=[projectcode, pagename, pageviews, bytes] columnTypes=[string, 
> string, int, int] separator=[[B@29a81175] nullstring=\N 
> lastColumnTakesRest=false
> shark> 14/04/11 19:06:55 INFO cluster.SparkDeploySchedulerBackend: Registered 
> executor: 
> Actor[akka.tcp://sparkexecu...@ip-172-31-19-11.us-west-1.compute.internal:45248/user/Executor#-1002203295]
>  with ID 27
> show 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 27 
> disconnected, so removing it
> 14/04/11 19:06:56 ERROR scheduler.TaskSchedulerImpl: Lost an executor 27 
> (already removed): remote Akka client disassociated
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/27 removed: Command exited with code 53
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/28 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/28 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/28 is now RUNNING
> tables;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6768) Do not support "float/double union decimal and decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei updated SPARK-6768:
--
Description: 
Do not support sql like that :
```
select cast(12.2056999 as float) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(12.2056999 as double) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(1241.20 as decimal(6, 2)) from testData limit 1
union
select cast(1.204 as decimal(5, 3)) from testData limit 1
```


  was:
Do not support sql like that :
...
select cast(12.2056999 as float) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
...
```
select cast(12.2056999 as double) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(1241.20 as decimal(6, 2)) from testData limit 1
union
select cast(1.204 as decimal(5, 3)) from testData limit 1
```



> Do not support "float/double union decimal and decimal(a ,b) union decimal(c, 
> d)"
> -
>
> Key: SPARK-6768
> URL: https://issues.apache.org/jira/browse/SPARK-6768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhongshuai Pei
>
> Do not support sql like that :
> ```
> select cast(12.2056999 as float) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ```
> ```
> select cast(12.2056999 as double) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ```
> ```
> select cast(1241.20 as decimal(6, 2)) from testData limit 1
> union
> select cast(1.204 as decimal(5, 3)) from testData limit 1
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6768) Do not support "float/double union decimal or decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei updated SPARK-6768:
--
Summary: Do not support "float/double union decimal or decimal(a ,b) union 
decimal(c, d)"  (was: Do not support "float/double union decimal and decimal(a 
,b) union decimal(c, d)")

> Do not support "float/double union decimal or decimal(a ,b) union decimal(c, 
> d)"
> 
>
> Key: SPARK-6768
> URL: https://issues.apache.org/jira/browse/SPARK-6768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhongshuai Pei
>
> Do not support sql like that :
> ...
> select cast(12.2056999 as float) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ...
> ```
> select cast(12.2056999 as double) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> ```
> ```
> select cast(1241.20 as decimal(6, 2)) from testData limit 1
> union
> select cast(1.204 as decimal(5, 3)) from testData limit 1
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6768) Do not support "float/double union decimal or decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei updated SPARK-6768:
--
Description: 
Do not support sql like that :

select cast(12.2056999 as float) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1

select cast(12.2056999 as double) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1

select cast(1241.20 as decimal(6, 2)) from testData limit 1
union
select cast(1.204 as decimal(5, 3)) from testData limit 1



  was:
Do not support sql like that :
...
select cast(12.2056999 as float) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
...
```
select cast(12.2056999 as double) from testData limit 1
union
select cast(12.2041 as decimal(7, 4)) from testData limit 1
```
```
select cast(1241.20 as decimal(6, 2)) from testData limit 1
union
select cast(1.204 as decimal(5, 3)) from testData limit 1
```



> Do not support "float/double union decimal or decimal(a ,b) union decimal(c, 
> d)"
> 
>
> Key: SPARK-6768
> URL: https://issues.apache.org/jira/browse/SPARK-6768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhongshuai Pei
>
> Do not support sql like that :
> select cast(12.2056999 as float) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> select cast(12.2056999 as double) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> select cast(1241.20 as decimal(6, 2)) from testData limit 1
> union
> select cast(1.204 as decimal(5, 3)) from testData limit 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6769) Usage of the ListenerBus in YarnClusterSuite is wrong

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6769:
---

Assignee: Apache Spark

> Usage of the ListenerBus in YarnClusterSuite is wrong
> -
>
> Key: SPARK-6769
> URL: https://issues.apache.org/jira/browse/SPARK-6769
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests, YARN
>Affects Versions: 1.4.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> In YarnClusterSuite, a test case uses `SaveExecutorInfo`  to handle 
> ExecutorAddedEvent as follows.
> {code}
> private class SaveExecutorInfo extends SparkListener {
>   val addedExecutorInfos = mutable.Map[String, ExecutorInfo]()
>   override def onExecutorAdded(executor: SparkListenerExecutorAdded) {
> addedExecutorInfos(executor.executorId) = executor.executorInfo
>   }
> }
> ...
> listener = new SaveExecutorInfo
> val sc = new SparkContext(new SparkConf()
>   .setAppName("yarn \"test app\" 'with quotes' and \\back\\slashes and 
> $dollarSigns"))
> sc.addSparkListener(listener)
> val status = new File(args(0))
> var result = "failure"
> try {
>   val data = sc.parallelize(1 to 4, 4).collect().toSet
>   assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
>   data should be (Set(1, 2, 3, 4))
>   result = "success"
> } finally {
>   sc.stop()
>   Files.write(result, status, UTF_8)
> }
> {code}
> But, the usage is wrong because Executors will spawn during initializing 
> SparkContext and SparkContext#addSparkListener should be invoked after the 
> initialization, thus after Executors spawn, so SaveExecutorInfo cannot handle 
> ExecutorAddedEvent.
> Following code refers the result of the handling ExecutorAddedEvent. Because 
> of the reason above, we cannot reach the assertion. 
> {code}
> // verify log urls are present
> listener.addedExecutorInfos.values.foreach { info =>
>   assert(info.logUrlMap.nonEmpty)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6768) Do not support "float/double union decimal or decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6768:
---

Assignee: (was: Apache Spark)

> Do not support "float/double union decimal or decimal(a ,b) union decimal(c, 
> d)"
> 
>
> Key: SPARK-6768
> URL: https://issues.apache.org/jira/browse/SPARK-6768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhongshuai Pei
>
> Do not support sql like that :
> select cast(12.2056999 as float) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> select cast(12.2056999 as double) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> select cast(1241.20 as decimal(6, 2)) from testData limit 1
> union
> select cast(1.204 as decimal(5, 3)) from testData limit 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6768) Do not support "float/double union decimal or decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484967#comment-14484967
 ] 

Apache Spark commented on SPARK-6768:
-

User 'DoingDone9' has created a pull request for this issue:
https://github.com/apache/spark/pull/5418

> Do not support "float/double union decimal or decimal(a ,b) union decimal(c, 
> d)"
> 
>
> Key: SPARK-6768
> URL: https://issues.apache.org/jira/browse/SPARK-6768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhongshuai Pei
>
> Do not support sql like that :
> select cast(12.2056999 as float) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> select cast(12.2056999 as double) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> select cast(1241.20 as decimal(6, 2)) from testData limit 1
> union
> select cast(1.204 as decimal(5, 3)) from testData limit 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6768) Do not support "float/double union decimal or decimal(a ,b) union decimal(c, d)"

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6768:
---

Assignee: Apache Spark

> Do not support "float/double union decimal or decimal(a ,b) union decimal(c, 
> d)"
> 
>
> Key: SPARK-6768
> URL: https://issues.apache.org/jira/browse/SPARK-6768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhongshuai Pei
>Assignee: Apache Spark
>
> Do not support sql like that :
> select cast(12.2056999 as float) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> select cast(12.2056999 as double) from testData limit 1
> union
> select cast(12.2041 as decimal(7, 4)) from testData limit 1
> select cast(1241.20 as decimal(6, 2)) from testData limit 1
> union
> select cast(1.204 as decimal(5, 3)) from testData limit 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6769) Usage of the ListenerBus in YarnClusterSuite is wrong

2015-04-08 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-6769:
--
Component/s: (was: Spark Core)

> Usage of the ListenerBus in YarnClusterSuite is wrong
> -
>
> Key: SPARK-6769
> URL: https://issues.apache.org/jira/browse/SPARK-6769
> Project: Spark
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 1.4.0
>Reporter: Kousuke Saruta
>Priority: Minor
>
> In YarnClusterSuite, a test case uses `SaveExecutorInfo`  to handle 
> ExecutorAddedEvent as follows.
> {code}
> private class SaveExecutorInfo extends SparkListener {
>   val addedExecutorInfos = mutable.Map[String, ExecutorInfo]()
>   override def onExecutorAdded(executor: SparkListenerExecutorAdded) {
> addedExecutorInfos(executor.executorId) = executor.executorInfo
>   }
> }
> ...
> listener = new SaveExecutorInfo
> val sc = new SparkContext(new SparkConf()
>   .setAppName("yarn \"test app\" 'with quotes' and \\back\\slashes and 
> $dollarSigns"))
> sc.addSparkListener(listener)
> val status = new File(args(0))
> var result = "failure"
> try {
>   val data = sc.parallelize(1 to 4, 4).collect().toSet
>   assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
>   data should be (Set(1, 2, 3, 4))
>   result = "success"
> } finally {
>   sc.stop()
>   Files.write(result, status, UTF_8)
> }
> {code}
> But, the usage is wrong because Executors will spawn during initializing 
> SparkContext and SparkContext#addSparkListener should be invoked after the 
> initialization, thus after Executors spawn, so SaveExecutorInfo cannot handle 
> ExecutorAddedEvent.
> Following code refers the result of the handling ExecutorAddedEvent. Because 
> of the reason above, we cannot reach the assertion. 
> {code}
> // verify log urls are present
> listener.addedExecutorInfos.values.foreach { info =>
>   assert(info.logUrlMap.nonEmpty)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6733) Suppression of usage of Scala existential code should be done

2015-04-08 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6733:
-
Assignee: Vinod KC

> Suppression of usage of Scala existential code should be done
> -
>
> Key: SPARK-6733
> URL: https://issues.apache.org/jira/browse/SPARK-6733
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 1.3.0
> Environment: OS: OSX Yosemite
> Hardware: Intel Core i7 with 16 GB RAM
>Reporter: Raymond Tay
>Assignee: Vinod KC
>Priority: Trivial
> Fix For: 1.4.0
>
>
> The inclusion of this statement in the file 
> {code:scala}
> import scala.language.existentials
> {code}
> should have suppressed all warnings regarding the use of scala existential 
> code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6770) DirectKafkaInputDStream has not been initialized when recovery from checkpoint

2015-04-08 Thread yangping wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated SPARK-6770:
---
Description: 
I am  read the data from kafka using createDirectStream method and save the 
received log to Mysql, the code snippets as follows
{code}
def functionToCreateContext(): StreamingContext = {
  val sparkConf = new SparkConf()
  val sc = new SparkContext(sparkConf)
  val ssc = new StreamingContext(sc, Seconds(10))
  ssc.checkpoint("/tmp/kafka/channel/offset") // set checkpoint directory
  ssc
}

val struct = StructType(StructField("log", StringType) ::Nil)

// Get StreamingContext from checkpoint data or create a new one
val ssc = StreamingContext.getOrCreate("/tmp/kafka/channel/offset", 
functionToCreateContext)

val SDB = KafkaUtils.createDirectStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, topics)
val sqlContext = new org.apache.spark.sql.SQLContext(ssc.sparkContext)
SDB.foreachRDD(rdd => {
  val result = rdd.map(item => {
println(item)
val result = item._2 match {
  case e: String => Row.apply(e)
  case _ => Row.apply("")
}
result
  })

  println(result.count())
  val df = sqlContext.createDataFrame(result, struct)
  df.insertIntoJDBC(url, "test", overwrite = false)
})
ssc.start()
ssc.awaitTermination()
ssc.stop()
{/code}

But when I  recovery the program from checkpoint, I encountered an exception:
{code}
Exception in thread "main" org.apache.spark.SparkException: 
org.apache.spark.streaming.kafka.DirectKafkaInputDStream@41a80e5a has not been 
initialized
at 
org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:266)
at 
org.apache.spark.streaming.dstream.InputDStream.isTimeValid(InputDStream.scala:51)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at 
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:223)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:218)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
org.apache.spark.streaming.scheduler.JobGenerator.restart(JobGenerator.scala:218)
at 
org.apache.spark.streaming.scheduler.JobGenerator.start(JobGenerator.scala:89)
at 
org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:67)
at 
org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:512)
at logstatstreaming.UserChannelTodb$.main(UserChannelTodb.scala:57)
at logstatstreaming.UserChannelTodb.main(UserChannelTodb.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{/code}
Not sure if this is a bug or a feature, but it's not obvious, so wanted to 
create a JIRA to make sure we document this behavior.Is someone can help me to 
see th

[jira] [Created] (SPARK-6770) DirectKafkaInputDStream has not been initialized when recovery from checkpoint

2015-04-08 Thread yangping wu (JIRA)

yangping wu created SPARK-6770:
--

 Summary: DirectKafkaInputDStream has not been initialized when 
recovery from checkpoint
 Key: SPARK-6770
 URL: https://issues.apache.org/jira/browse/SPARK-6770
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.3.0
Reporter: yangping wu


I am  read the data from kafka using createDirectStream method and save the 
received log to Mysql, the code snippets as follows
{code}
def functionToCreateContext(): StreamingContext = {
  val sparkConf = new SparkConf()
  val sc = new SparkContext(sparkConf)
  val ssc = new StreamingContext(sc, Seconds(10))
  ssc.checkpoint("/tmp/kafka/channel/offset") // set checkpoint directory
  ssc
}

val struct = StructType(StructField("log", StringType) ::Nil)

// Get StreamingContext from checkpoint data or create a new one
val ssc = StreamingContext.getOrCreate("/tmp/kafka/channel/offset", 
functionToCreateContext)

val SDB = KafkaUtils.createDirectStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, topics)
val sqlContext = new org.apache.spark.sql.SQLContext(ssc.sparkContext)
SDB.foreachRDD(rdd => {
  val result = rdd.map(item => {
println(item)
val result = item._2 match {
  case e: String => Row.apply(e)
  case _ => Row.apply("")
}
result
  })

  println(result.count())
  val df = sqlContext.createDataFrame(result, struct)
  df.insertIntoJDBC(url, "test", overwrite = false)
})
ssc.start()
ssc.awaitTermination()
ssc.stop()
{/code}

But when I  recovery the program from checkpoint, I encountered an exception:
{code}
Exception in thread "main" org.apache.spark.SparkException: 
org.apache.spark.streaming.kafka.DirectKafkaInputDStream@41a80e5a has not been 
initialized
at 
org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:266)
at 
org.apache.spark.streaming.dstream.InputDStream.isTimeValid(InputDStream.scala:51)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at 
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:223)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:218)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
org.apache.spark.streaming.scheduler.JobGenerator.restart(JobGenerator.scala:218)
at 
org.apache.spark.streaming.scheduler.JobGenerator.start(JobGenerator.scala:89)
at 
org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:67)
at 
org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:512)
at logstatstreaming.UserChannelTodb$.main(UserChannelTodb.scala:57)
at logstatstreaming.UserChannelTodb.main(UserChannelTodb.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark

[jira] [Updated] (SPARK-6770) DirectKafkaInputDStream has not been initialized when recovery from checkpoint

2015-04-08 Thread yangping wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated SPARK-6770:
---
Description: 
I am  read the data from kafka using createDirectStream method and save the 
received log to Mysql, the code snippets as follows
{code}
def functionToCreateContext(): StreamingContext = {
  val sparkConf = new SparkConf()
  val sc = new SparkContext(sparkConf)
  val ssc = new StreamingContext(sc, Seconds(10))
  ssc.checkpoint("/tmp/kafka/channel/offset") // set checkpoint directory
  ssc
}

val struct = StructType(StructField("log", StringType) ::Nil)

// Get StreamingContext from checkpoint data or create a new one
val ssc = StreamingContext.getOrCreate("/tmp/kafka/channel/offset", 
functionToCreateContext)

val SDB = KafkaUtils.createDirectStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, topics)
val sqlContext = new org.apache.spark.sql.SQLContext(ssc.sparkContext)
SDB.foreachRDD(rdd => {
  val result = rdd.map(item => {
println(item)
val result = item._2 match {
  case e: String => Row.apply(e)
  case _ => Row.apply("")
}
result
  })

  println(result.count())
  val df = sqlContext.createDataFrame(result, struct)
  df.insertIntoJDBC(url, "test", overwrite = false)
})
ssc.start()
ssc.awaitTermination()
ssc.stop()
{code}

But when I  recovery the program from checkpoint, I encountered an exception:
{code}
Exception in thread "main" org.apache.spark.SparkException: 
org.apache.spark.streaming.kafka.DirectKafkaInputDStream@41a80e5a has not been 
initialized
at 
org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:266)
at 
org.apache.spark.streaming.dstream.InputDStream.isTimeValid(InputDStream.scala:51)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at 
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:223)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:218)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
org.apache.spark.streaming.scheduler.JobGenerator.restart(JobGenerator.scala:218)
at 
org.apache.spark.streaming.scheduler.JobGenerator.start(JobGenerator.scala:89)
at 
org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:67)
at 
org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:512)
at logstatstreaming.UserChannelTodb$.main(UserChannelTodb.scala:57)
at logstatstreaming.UserChannelTodb.main(UserChannelTodb.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}
Not sure if this is a bug or a feature, but it's not obvious, so wanted to 
create a JIRA to make sure we document this behavior.Is someone can help me to 
see the

[jira] [Updated] (SPARK-6770) DirectKafkaInputDStream has not been initialized when recovery from checkpoint

2015-04-08 Thread yangping wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated SPARK-6770:
---
Description: 
I am  read data from kafka using createDirectStream method and save the 
received log to Mysql, the code snippets as follows
{code}
def functionToCreateContext(): StreamingContext = {
  val sparkConf = new SparkConf()
  val sc = new SparkContext(sparkConf)
  val ssc = new StreamingContext(sc, Seconds(10))
  ssc.checkpoint("/tmp/kafka/channel/offset") // set checkpoint directory
  ssc
}

val struct = StructType(StructField("log", StringType) ::Nil)

// Get StreamingContext from checkpoint data or create a new one
val ssc = StreamingContext.getOrCreate("/tmp/kafka/channel/offset", 
functionToCreateContext)

val SDB = KafkaUtils.createDirectStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, topics)
val sqlContext = new org.apache.spark.sql.SQLContext(ssc.sparkContext)
SDB.foreachRDD(rdd => {
  val result = rdd.map(item => {
println(item)
val result = item._2 match {
  case e: String => Row.apply(e)
  case _ => Row.apply("")
}
result
  })

  println(result.count())
  val df = sqlContext.createDataFrame(result, struct)
  df.insertIntoJDBC(url, "test", overwrite = false)
})
ssc.start()
ssc.awaitTermination()
ssc.stop()
{code}

But when I  recovery the program from checkpoint, I encountered an exception:
{code}
Exception in thread "main" org.apache.spark.SparkException: 
org.apache.spark.streaming.kafka.DirectKafkaInputDStream@41a80e5a has not been 
initialized
at 
org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:266)
at 
org.apache.spark.streaming.dstream.InputDStream.isTimeValid(InputDStream.scala:51)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
at 
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at 
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:223)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:218)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
org.apache.spark.streaming.scheduler.JobGenerator.restart(JobGenerator.scala:218)
at 
org.apache.spark.streaming.scheduler.JobGenerator.start(JobGenerator.scala:89)
at 
org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:67)
at 
org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:512)
at logstatstreaming.UserChannelTodb$.main(UserChannelTodb.scala:57)
at logstatstreaming.UserChannelTodb.main(UserChannelTodb.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}
Not sure if this is a bug or a feature, but it's not obvious, so wanted to 
create a JIRA to make sure we document this behavior.Is someone can help me to 
see the reas

[jira] [Resolved] (SPARK-6699) PySpark Acess Denied error in windows seen only in ver 1.3

2015-04-08 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6699.
--
Resolution: Not A Problem

This looks strongly like a local permissions issue and/or lack of numpy, but 
can be reopened if there is more follow-up about the cause of the error.

> PySpark  Acess Denied error in windows seen only in ver 1.3
> ---
>
> Key: SPARK-6699
> URL: https://issues.apache.org/jira/browse/SPARK-6699
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.3.0
> Environment: Windows 8.1 x64
> Windows 7 SP1 x64
>Reporter: RoCm
>
> Downloaded version 1.3 and tried to run pyspark
> I hit this error and unable to proceed (tried versions 1.2 and 1.1 works fine)
> Pasting the error logs below
> C:\Users\roXYZ\.babun\cygwin\home\roXYZ\spark-1.3.0-bin-hadoop2.4\bin>pyspark
> Running python with 
> PYTHONPATH=C:\Users\roXYZ\.babun\cygwin\home\roXYZ\spark-1.3.0-bin-hadoop2.4\bin\..\python\lib\py4j-0.8.2.1-src.zip;
> C:\Users\roXYZ\.babun\cygwin\home\roXYZ\spark-1.3.0-bin-hadoop2.4\bin\..\python;
> Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on 
> win32
> Type "help", "copyright", "credits" or "license" for more information.
> No module named numpy
> Traceback (most recent call last): File 
> "C:\Users\roXYZ\.babun\cygwin\home\roXYZ\spark-1.3.0-bin-hadoop2.4\bin\..\python\pyspark\shell.py",
>  line 50, in 
> sc = SparkContext(appName="PySparkShell", pyFiles=add_files)
> File 
> "C:\Users\roXYZ\.babun\cygwin\home\roXYZ\spark-1.3.0-bin-hadoop2.4\python\pyspark\context.py",
>  line 108, in __init__
> SparkContext._ensure_initialized(self, gateway=gateway)
> File 
> "C:\Users\roXYZ\.babun\cygwin\home\roXYZ\spark-1.3.0-bin-hadoop2.4\python\pyspark\context.py",
>  line 222, in _ensure_initialized
> SparkContext._gateway = gateway or launch_gateway()
>  File 
> "C:\Users\roXYZ\.babun\cygwin\home\roXYZ\spark-1.3.0-bin-hadoop2.4\python\pyspark\java_gateway.py",
>  line 65, in launch_gateway
> proc = Popen(command, stdin=PIPE, env=env)
>   File "C:\Python27\lib\subprocess.py", line 710, in __init__errread, 
> errwrite)
>   File "C:\Python27\lib\subprocess.py", line 958, in _execute_child
> startupinfo)
> WindowsError: [Error 5] Access is denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6771) Table alias in Spark SQL

2015-04-08 Thread Jacky19820629 (JIRA)

Jacky19820629 created SPARK-6771:


 Summary: Table alias in Spark SQL
 Key: SPARK-6771
 URL: https://issues.apache.org/jira/browse/SPARK-6771
 Project: Spark
  Issue Type: Question
  Components: SQL
Affects Versions: 1.2.1
 Environment: Spark 1.2.1 build with Hive 0.13.1 on YARN 2.5.2
Reporter: Jacky19820629


I had cache several tables in memory , I found Spark will using Hadoop data 
when i running a SQL with table alias , is there any configuration can fix it ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6681) JAVA_HOME error with upgrade to Spark 1.3.0

2015-04-08 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6681.
--
Resolution: Cannot Reproduce

We can reopen this if there is a follow-up with more detail but looks like a 
YARN / env issue.

> JAVA_HOME error with upgrade to Spark 1.3.0
> ---
>
> Key: SPARK-6681
> URL: https://issues.apache.org/jira/browse/SPARK-6681
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.3.0
> Environment: Client is Mac OS X version 10.10.2, cluster is running 
> HDP 2.1 stack.
>Reporter: Ken Williams
>
> I’m trying to upgrade a Spark project, written in Scala, from Spark 1.2.1 to 
> 1.3.0, so I changed my `build.sbt` like so:
> {code}
> -libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1" % 
> "provided"
> +libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % 
> "provided"
> {code}
> then make an `assembly` jar, and submit it:
> {code}
> HADOOP_CONF_DIR=/etc/hadoop/conf \
> spark-submit \
> --driver-class-path=/etc/hbase/conf \
> --conf spark.hadoop.validateOutputSpecs=false \
> --conf 
> spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.3.0-hadoop2.4.0.jar \
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
> --deploy-mode=cluster \
> --master=yarn \
> --class=TestObject \
> --num-executors=54 \
> target/scala-2.11/myapp-assembly-1.2.jar
> {code}
> The job fails to submit, with the following exception in the terminal:
> {code}
> 15/03/19 10:30:07 INFO yarn.Client: 
> 15/03/19 10:20:03 INFO yarn.Client: 
>client token: N/A
>diagnostics: Application application_1420225286501_4698 failed 2 times 
> due to AM 
>  Container for appattempt_1420225286501_4698_02 exited with  
> exitCode: 127 
>  due to: Exception from container-launch: 
> org.apache.hadoop.util.Shell$ExitCodeException: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>   at org.apache.hadoop.util.Shell.run(Shell.java:379)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> Finally, I go and check the YARN app master’s web interface (since the job is 
> there, I know it at least made it that far), and the only logs it shows are 
> these:
> {code}
> Log Type: stderr
> Log Length: 61
> /bin/bash: {{JAVA_HOME}}/bin/java: No such file or directory
> 
> Log Type: stdout
> Log Length: 0
> {code}
> I’m not sure how to interpret that - is {{ {{JAVA_HOME}} }} a literal 
> (including the brackets) that’s somehow making it into a script?  Is this 
> coming from the worker nodes or the driver?  Anything I can do to experiment 
> & troubleshoot?
> I do have {{JAVA_HOME}} set in the hadoop config files on all the nodes of 
> the cluster:
> {code}
> % grep JAVA_HOME /etc/hadoop/conf/*.sh
> /etc/hadoop/conf/hadoop-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
> /etc/hadoop/conf/yarn-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
> {code}
> Has this behavior changed in 1.3.0 since 1.2.1?  Using 1.2.1 and making no 
> other changes, the job completes fine.
> (Note: I originally posted this on the Spark mailing list and also on Stack 
> Overflow, I'll update both places if/when I find a solution.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-1499) Workers continuously produce failing executors

2015-04-08 Thread XeonZhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XeonZhao updated SPARK-1499:

Comment: was deleted

(was: How to solve this problem ？I have encountered this issue.)

> Workers continuously produce failing executors
> --
>
> Key: SPARK-1499
> URL: https://issues.apache.org/jira/browse/SPARK-1499
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 0.9.1, 1.0.0
>Reporter: Aaron Davidson
>
> If a node is in a bad state, such that newly started executors fail on 
> startup or first use, the Standalone Cluster Worker will happily keep 
> spawning new ones. A better behavior would be for a Worker to mark itself as 
> dead if it has had a history of continuously producing erroneous executors, 
> or else to somehow prevent a driver from re-registering executors from the 
> same machine repeatedly.
> Reported on mailing list: 
> http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccal8t0bqjfgtf-vbzjq6yj7ckbl_9p9s0trvew2mvg6zbngx...@mail.gmail.com%3E
> Relevant logs: 
> {noformat}
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/4 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/4 removed: Command exited with code 53
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 4 
> disconnected, so removing it
> 14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4 
> (already removed): Failed to create local directory (bad spark.local.dir?)
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/27 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/27 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now RUNNING
> 14/04/11 19:06:52 INFO storage.BlockManagerMasterActor$BlockManagerInfo: 
> Registering block manager ip-172-31-24-76.us-west-1.compute.internal:50256 
> with 32.7 GB RAM
> 14/04/11 19:06:52 INFO metastore.HiveMetaStore: 0: get_table : db=default 
> tbl=wikistats_pd
> 14/04/11 19:06:52 INFO HiveMetaStore.audit: ugi=root  ip=unknown-ip-addr  
> cmd=get_table : db=default tbl=wikistats_pd 
> 14/04/11 19:06:53 DEBUG hive.log: DDL: struct wikistats_pd { string 
> projectcode, string pagename, i32 pageviews, i32 bytes}
> 14/04/11 19:06:53 DEBUG lazy.LazySimpleSerDe: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: 
> columnNames=[projectcode, pagename, pageviews, bytes] columnTypes=[string, 
> string, int, int] separator=[[B@29a81175] nullstring=\N 
> lastColumnTakesRest=false
> shark> 14/04/11 19:06:55 INFO cluster.SparkDeploySchedulerBackend: Registered 
> executor: 
> Actor[akka.tcp://sparkexecu...@ip-172-31-19-11.us-west-1.compute.internal:45248/user/Executor#-1002203295]
>  with ID 27
> show 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 27 
> disconnected, so removing it
> 14/04/11 19:06:56 ERROR scheduler.TaskSchedulerImpl: Lost an executor 27 
> (already removed): remote Akka client disassociated
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/27 removed: Command exited with code 53
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/28 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/28 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/28 is now RUNNING
> tables;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-1499) Workers continuously produce failing executors

2015-04-08 Thread XeonZhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XeonZhao updated SPARK-1499:

Comment: was deleted

(was: How to solve this problem ？I have encountered this issue.)

> Workers continuously produce failing executors
> --
>
> Key: SPARK-1499
> URL: https://issues.apache.org/jira/browse/SPARK-1499
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 0.9.1, 1.0.0
>Reporter: Aaron Davidson
>
> If a node is in a bad state, such that newly started executors fail on 
> startup or first use, the Standalone Cluster Worker will happily keep 
> spawning new ones. A better behavior would be for a Worker to mark itself as 
> dead if it has had a history of continuously producing erroneous executors, 
> or else to somehow prevent a driver from re-registering executors from the 
> same machine repeatedly.
> Reported on mailing list: 
> http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccal8t0bqjfgtf-vbzjq6yj7ckbl_9p9s0trvew2mvg6zbngx...@mail.gmail.com%3E
> Relevant logs: 
> {noformat}
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/4 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/4 removed: Command exited with code 53
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Executor 4 
> disconnected, so removing it
> 14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4 
> (already removed): Failed to create local directory (bad spark.local.dir?)
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/27 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:52 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/27 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:52 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now RUNNING
> 14/04/11 19:06:52 INFO storage.BlockManagerMasterActor$BlockManagerInfo: 
> Registering block manager ip-172-31-24-76.us-west-1.compute.internal:50256 
> with 32.7 GB RAM
> 14/04/11 19:06:52 INFO metastore.HiveMetaStore: 0: get_table : db=default 
> tbl=wikistats_pd
> 14/04/11 19:06:52 INFO HiveMetaStore.audit: ugi=root  ip=unknown-ip-addr  
> cmd=get_table : db=default tbl=wikistats_pd 
> 14/04/11 19:06:53 DEBUG hive.log: DDL: struct wikistats_pd { string 
> projectcode, string pagename, i32 pageviews, i32 bytes}
> 14/04/11 19:06:53 DEBUG lazy.LazySimpleSerDe: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: 
> columnNames=[projectcode, pagename, pageviews, bytes] columnTypes=[string, 
> string, int, int] separator=[[B@29a81175] nullstring=\N 
> lastColumnTakesRest=false
> shark> 14/04/11 19:06:55 INFO cluster.SparkDeploySchedulerBackend: Registered 
> executor: 
> Actor[akka.tcp://sparkexecu...@ip-172-31-19-11.us-west-1.compute.internal:45248/user/Executor#-1002203295]
>  with ID 27
> show 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 27 
> disconnected, so removing it
> 14/04/11 19:06:56 ERROR scheduler.TaskSchedulerImpl: Lost an executor 27 
> (already removed): remote Akka client disassociated
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/27 is now FAILED (Command exited with code 53)
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Executor 
> app-20140411190649-0008/27 removed: Command exited with code 53
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor added: 
> app-20140411190649-0008/28 on 
> worker-20140409212012-ip-172-31-19-11.us-west-1.compute.internal-58614 
> (ip-172-31-19-11.us-west-1.compute.internal:58614) with 8 cores
> 14/04/11 19:06:56 INFO cluster.SparkDeploySchedulerBackend: Granted executor 
> ID app-20140411190649-0008/28 on hostPort 
> ip-172-31-19-11.us-west-1.compute.internal:58614 with 8 cores, 56.9 GB RAM
> 14/04/11 19:06:56 INFO client.AppClient$ClientActor: Executor updated: 
> app-20140411190649-0008/28 is now RUNNING
> tables;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6772) spark sql error when running code on large number of records

2015-04-08 Thread Aditya Parmar (JIRA)

Aditya Parmar created SPARK-6772:


 Summary: spark sql error when running code on large number of 
records
 Key: SPARK-6772
 URL: https://issues.apache.org/jira/browse/SPARK-6772
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 1.2.0
Reporter: Aditya Parmar


Hi all ,
I am getting an Arrayoutboundsindex error when i try to run a simple filtering 
colums query on a file with 2.5 lac records.runs fine when running on a file 
with 2k records .

15/04/08 16:54:06 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 3, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2) on 
executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
(null) [duplicate 1]
15/04/08 16:54:06 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 4, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4) on 
executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
(null) [duplicate 2]
15/04/08 16:54:06 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 5, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 3) on 
executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
(null) [duplicate 3]
15/04/08 16:54:06 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 6, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 5) on 
executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
(null) [duplicate 4]
15/04/08 16:54:06 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; 
aborting job
15/04/08 16:54:06 INFO TaskSchedulerImpl: Cancelling stage 0
15/04/08 16:54:06 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/04/08 16:54:06 INFO DAGScheduler: Job 0 failed: saveAsTextFile at 
JavaSchemaRDD.scala:42, took 1.914477 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost 
task 1.3 in stage 0.0 (TID 5, blrwfl11189.igatecorp.com): 
java.lang.ArrayIndexOutOfBoundsException

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[aditya@blrwfl11189 ~]$
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6773) check -license will passed in next time when rat jar download failed.

2015-04-08 Thread June (JIRA)

June created SPARK-6773:
---

 Summary: check -license will passed in next time when rat jar 
download failed.
 Key: SPARK-6773
 URL: https://issues.apache.org/jira/browse/SPARK-6773
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: June


In dev/check-license, it will download Rat jar if it not exist. if download 
failed, it will report error:
**
Attempting to fetch rat
Our attempt to download rat locally to 
/home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar failed. Please install 
rat manually.
*
but if run it again in next cycle, it will check RAT passed and go on building 
also an error reported:
**
Error: Invalid or corrupt jarfile 
/home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar
RAT checks passed.
*

This is because:
1. The error tmp rat.jar is not removed when rat jar download failed in last 
time. So it will go on checking license using the error rat.jar
2. the rat-results.txt is empty because rat.jar run failed, so RAT faild.


Suggest:
1. Add a clean step when rat.jar download faild.
2. Add a error checking logic after run rat checking.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6773) check -license will passed in next time when rat jar download failed.

2015-04-08 Thread June (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

June updated SPARK-6773:

Description: 
In dev/check-license, it will download Rat jar if it not exist. if download 
failed, it will report error:
**
Attempting to fetch rat
Our attempt to download rat locally to 
/home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar failed. Please install 
rat manually.
*
but if run it again in next cycle, it will check RAT passed and go on building 
also an error reported:
**
Error: Invalid or corrupt jarfile 
/home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar
RAT checks passed.
*

This is because:
1. The error tmp rat.jar is not removed when rat jar download failed in last 
time. So it will go on checking license using the error rat.jar
2. The rat-results.txt is empty because rat.jar run failed, so RAT check passed.


Suggest:
1. Add a clean step when rat.jar download faild.
2. Add a error checking logic after run rat checking.



  was:
In dev/check-license, it will download Rat jar if it not exist. if download 
failed, it will report error:
**
Attempting to fetch rat
Our attempt to download rat locally to 
/home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar failed. Please install 
rat manually.
*
but if run it again in next cycle, it will check RAT passed and go on building 
also an error reported:
**
Error: Invalid or corrupt jarfile 
/home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar
RAT checks passed.
*

This is because:
1. The error tmp rat.jar is not removed when rat jar download failed in last 
time. So it will go on checking license using the error rat.jar
2. the rat-results.txt is empty because rat.jar run failed, so RAT faild.


Suggest:
1. Add a clean step when rat.jar download faild.
2. Add a error checking logic after run rat checking.




> check -license will passed in next time when rat jar download failed.
> -
>
> Key: SPARK-6773
> URL: https://issues.apache.org/jira/browse/SPARK-6773
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: June
>
> In dev/check-license, it will download Rat jar if it not exist. if download 
> failed, it will report error:
> **
> Attempting to fetch rat
> Our attempt to download rat locally to 
> /home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar failed. Please 
> install rat manually.
> *
> but if run it again in next cycle, it will check RAT passed and go on 
> building also an error reported:
> **
> Error: Invalid or corrupt jarfile 
> /home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar
> RAT checks passed.
> *
> This is because:
> 1. The error tmp rat.jar is not removed when rat jar download failed in last 
> time. So it will go on checking license using the error rat.jar
> 2. The rat-results.txt is empty because rat.jar run failed, so RAT check 
> passed.
> Suggest:
> 1. Add a clean step when rat.jar download faild.
> 2. Add a error checking logic after run rat checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6773) check -license will passed in next time when rat jar download failed.

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6773:
---

Assignee: Apache Spark

> check -license will passed in next time when rat jar download failed.
> -
>
> Key: SPARK-6773
> URL: https://issues.apache.org/jira/browse/SPARK-6773
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: June
>Assignee: Apache Spark
>
> In dev/check-license, it will download Rat jar if it not exist. if download 
> failed, it will report error:
> **
> Attempting to fetch rat
> Our attempt to download rat locally to 
> /home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar failed. Please 
> install rat manually.
> *
> but if run it again in next cycle, it will check RAT passed and go on 
> building also an error reported:
> **
> Error: Invalid or corrupt jarfile 
> /home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar
> RAT checks passed.
> *
> This is because:
> 1. The error tmp rat.jar is not removed when rat jar download failed in last 
> time. So it will go on checking license using the error rat.jar
> 2. The rat-results.txt is empty because rat.jar run failed, so RAT check 
> passed.
> Suggest:
> 1. Add a clean step when rat.jar download faild.
> 2. Add a error checking logic after run rat checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6773) check -license will passed in next time when rat jar download failed.

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485156#comment-14485156
 ] 

Apache Spark commented on SPARK-6773:
-

User 'sisihj' has created a pull request for this issue:
https://github.com/apache/spark/pull/5421

> check -license will passed in next time when rat jar download failed.
> -
>
> Key: SPARK-6773
> URL: https://issues.apache.org/jira/browse/SPARK-6773
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: June
>
> In dev/check-license, it will download Rat jar if it not exist. if download 
> failed, it will report error:
> **
> Attempting to fetch rat
> Our attempt to download rat locally to 
> /home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar failed. Please 
> install rat manually.
> *
> but if run it again in next cycle, it will check RAT passed and go on 
> building also an error reported:
> **
> Error: Invalid or corrupt jarfile 
> /home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar
> RAT checks passed.
> *
> This is because:
> 1. The error tmp rat.jar is not removed when rat jar download failed in last 
> time. So it will go on checking license using the error rat.jar
> 2. The rat-results.txt is empty because rat.jar run failed, so RAT check 
> passed.
> Suggest:
> 1. Add a clean step when rat.jar download faild.
> 2. Add a error checking logic after run rat checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6773) check -license will passed in next time when rat jar download failed.

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6773:
---

Assignee: (was: Apache Spark)

> check -license will passed in next time when rat jar download failed.
> -
>
> Key: SPARK-6773
> URL: https://issues.apache.org/jira/browse/SPARK-6773
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: June
>
> In dev/check-license, it will download Rat jar if it not exist. if download 
> failed, it will report error:
> **
> Attempting to fetch rat
> Our attempt to download rat locally to 
> /home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar failed. Please 
> install rat manually.
> *
> but if run it again in next cycle, it will check RAT passed and go on 
> building also an error reported:
> **
> Error: Invalid or corrupt jarfile 
> /home/spark/hejun/sparkgit/spark/lib/apache-rat-0.10.jar
> RAT checks passed.
> *
> This is because:
> 1. The error tmp rat.jar is not removed when rat jar download failed in last 
> time. So it will go on checking license using the error rat.jar
> 2. The rat-results.txt is empty because rat.jar run failed, so RAT check 
> passed.
> Suggest:
> 1. Add a clean step when rat.jar download faild.
> 2. Add a error checking logic after run rat checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5960) Allow AWS credentials to be passed to KinesisUtils.createStream()

2015-04-08 Thread Cihan Emre (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485168#comment-14485168
 ] 

Cihan Emre commented on SPARK-5960:
---

[~cfregly] Are you working on this? I might be able to help if it's ok for you.

> Allow AWS credentials to be passed to KinesisUtils.createStream()
> -
>
> Key: SPARK-5960
> URL: https://issues.apache.org/jira/browse/SPARK-5960
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Chris Fregly
>Assignee: Chris Fregly
>
> While IAM roles are preferable, we're seeing a lot of cases where we need to 
> pass AWS credentials when creating the KinesisReceiver.
> Notes:
> * Make sure we don't log the credentials anywhere
> * Maintain compatibility with existing KinesisReceiver-based code.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6681) JAVA_HOME error with upgrade to Spark 1.3.0

2015-04-08 Thread Ken Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485170#comment-14485170
 ] 

Ken Williams commented on SPARK-6681:
-

Sounds good - I haven't been able to investigate it further, but I'll put any 
further info here if/when I find it.

> JAVA_HOME error with upgrade to Spark 1.3.0
> ---
>
> Key: SPARK-6681
> URL: https://issues.apache.org/jira/browse/SPARK-6681
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.3.0
> Environment: Client is Mac OS X version 10.10.2, cluster is running 
> HDP 2.1 stack.
>Reporter: Ken Williams
>
> I’m trying to upgrade a Spark project, written in Scala, from Spark 1.2.1 to 
> 1.3.0, so I changed my `build.sbt` like so:
> {code}
> -libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1" % 
> "provided"
> +libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % 
> "provided"
> {code}
> then make an `assembly` jar, and submit it:
> {code}
> HADOOP_CONF_DIR=/etc/hadoop/conf \
> spark-submit \
> --driver-class-path=/etc/hbase/conf \
> --conf spark.hadoop.validateOutputSpecs=false \
> --conf 
> spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.3.0-hadoop2.4.0.jar \
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
> --deploy-mode=cluster \
> --master=yarn \
> --class=TestObject \
> --num-executors=54 \
> target/scala-2.11/myapp-assembly-1.2.jar
> {code}
> The job fails to submit, with the following exception in the terminal:
> {code}
> 15/03/19 10:30:07 INFO yarn.Client: 
> 15/03/19 10:20:03 INFO yarn.Client: 
>client token: N/A
>diagnostics: Application application_1420225286501_4698 failed 2 times 
> due to AM 
>  Container for appattempt_1420225286501_4698_02 exited with  
> exitCode: 127 
>  due to: Exception from container-launch: 
> org.apache.hadoop.util.Shell$ExitCodeException: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>   at org.apache.hadoop.util.Shell.run(Shell.java:379)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> Finally, I go and check the YARN app master’s web interface (since the job is 
> there, I know it at least made it that far), and the only logs it shows are 
> these:
> {code}
> Log Type: stderr
> Log Length: 61
> /bin/bash: {{JAVA_HOME}}/bin/java: No such file or directory
> 
> Log Type: stdout
> Log Length: 0
> {code}
> I’m not sure how to interpret that - is {{ {{JAVA_HOME}} }} a literal 
> (including the brackets) that’s somehow making it into a script?  Is this 
> coming from the worker nodes or the driver?  Anything I can do to experiment 
> & troubleshoot?
> I do have {{JAVA_HOME}} set in the hadoop config files on all the nodes of 
> the cluster:
> {code}
> % grep JAVA_HOME /etc/hadoop/conf/*.sh
> /etc/hadoop/conf/hadoop-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
> /etc/hadoop/conf/yarn-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
> {code}
> Has this behavior changed in 1.3.0 since 1.2.1?  Using 1.2.1 and making no 
> other changes, the job completes fine.
> (Note: I originally posted this on the Spark mailing list and also on Stack 
> Overflow, I'll update both places if/when I find a solution.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6774) Implement Parquet complex types backwards-compatiblity rules

2015-04-08 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-6774:
-

 Summary: Implement Parquet complex types backwards-compatiblity 
rules
 Key: SPARK-6774
 URL: https://issues.apache.org/jira/browse/SPARK-6774
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.2.1, 1.1.1, 1.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Before [Parquet format PR 
#17|https://github.com/apache/incubator-parquet-format/pull/17], 
representations of Parquet complex types were not standardized. Different 
systems just followed their own rules. This left lots of legacy Parquet files 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6774) Implement Parquet complex types backwards-compatiblity rules

2015-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-6774:
--
Description: [Parquet format PR 
#17|https://github.com/apache/incubator-parquet-format/pull/17] standardized 
representation of Parquet complex types and listed backwards-compatibility 
rules. Spark SQL should implement these compatibility rules to improve 
interoperatability.  (was: Before [Parquet format PR 
#17|https://github.com/apache/incubator-parquet-format/pull/17], 
representations of Parquet complex types were not standardized. Different 
systems just followed their own rules. This left lots of legacy Parquet files )

> Implement Parquet complex types backwards-compatiblity rules
> 
>
> Key: SPARK-6774
> URL: https://issues.apache.org/jira/browse/SPARK-6774
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0, 1.1.1, 1.2.1, 1.3.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> [Parquet format PR 
> #17|https://github.com/apache/incubator-parquet-format/pull/17] standardized 
> representation of Parquet complex types and listed backwards-compatibility 
> rules. Spark SQL should implement these compatibility rules to improve 
> interoperatability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6775) Simplify CatalystConverter class hierarchy and pass in Parquet schema

2015-04-08 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-6775:
-

 Summary: Simplify CatalystConverter class hierarchy and pass in 
Parquet schema
 Key: SPARK-6775
 URL: https://issues.apache.org/jira/browse/SPARK-6775
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.3.0, 1.2.1, 1.1.1, 1.0.2
Reporter: Cheng Lian
Assignee: Cheng Lian


{{CataystConverter}} classes are used to convert Parquet records to Spark SQL 
row objects. Current converter implementations have the following problems:
# They simply ignore original Parquet schema, which makes adding Parquet 
backwards-compatibility rules impossible.
# They are unnecessary over complicated.
# {{SpecificMutableRow}} is only used for structs whose fields are all of 
primitive types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6776) Implement backwards-compatibility rules in CatalystConverters

2015-04-08 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-6776:
-

 Summary: Implement backwards-compatibility rules in 
CatalystConverters
 Key: SPARK-6776
 URL: https://issues.apache.org/jira/browse/SPARK-6776
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.3.0, 1.2.1, 1.1.1, 1.0.2
Reporter: Cheng Lian
Assignee: Cheng Lian






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6776) Implement backwards-compatibility rules in CatalystConverters

2015-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-6776:
--
Description: Spark SQL should also be able to read Parquet complex types 
represented in several commonly used non-standard way. For example, legacy 
files written by parquet-avro, parquet-thrift, and parquet-hive. We may just 
follow the pattern used in {{AvroIndexedRecordConverter}}.

> Implement backwards-compatibility rules in CatalystConverters
> -
>
> Key: SPARK-6776
> URL: https://issues.apache.org/jira/browse/SPARK-6776
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Spark SQL should also be able to read Parquet complex types represented in 
> several commonly used non-standard way. For example, legacy files written by 
> parquet-avro, parquet-thrift, and parquet-hive. We may just follow the 
> pattern used in {{AvroIndexedRecordConverter}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6777) Implement backwards-compatibility rules in Parquet schema converters

2015-04-08 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-6777:
-

 Summary: Implement backwards-compatibility rules in Parquet schema 
converters
 Key: SPARK-6777
 URL: https://issues.apache.org/jira/browse/SPARK-6777
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.3.0, 1.2.1, 1.1.1, 1.0.2
Reporter: Cheng Lian
Assignee: Cheng Lian


When converting a Parquet schema to Spark SQL schema, we should recognize 
commonly used legacy non-standard representation of complex types. We can 
follow the pattern used in Parquet's {{AvroSchemaConverter}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6775) Simplify CatalystConverter class hierarchy and pass in Parquet schema

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6775:
---

Assignee: Cheng Lian  (was: Apache Spark)

> Simplify CatalystConverter class hierarchy and pass in Parquet schema
> -
>
> Key: SPARK-6775
> URL: https://issues.apache.org/jira/browse/SPARK-6775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> {{CataystConverter}} classes are used to convert Parquet records to Spark SQL 
> row objects. Current converter implementations have the following problems:
> # They simply ignore original Parquet schema, which makes adding Parquet 
> backwards-compatibility rules impossible.
> # They are unnecessary over complicated.
> # {{SpecificMutableRow}} is only used for structs whose fields are all of 
> primitive types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6776) Implement backwards-compatibility rules in CatalystConverters

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485221#comment-14485221
 ] 

Apache Spark commented on SPARK-6776:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/5422

> Implement backwards-compatibility rules in CatalystConverters
> -
>
> Key: SPARK-6776
> URL: https://issues.apache.org/jira/browse/SPARK-6776
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Spark SQL should also be able to read Parquet complex types represented in 
> several commonly used non-standard way. For example, legacy files written by 
> parquet-avro, parquet-thrift, and parquet-hive. We may just follow the 
> pattern used in {{AvroIndexedRecordConverter}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6775) Simplify CatalystConverter class hierarchy and pass in Parquet schema

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6775:
---

Assignee: Apache Spark  (was: Cheng Lian)

> Simplify CatalystConverter class hierarchy and pass in Parquet schema
> -
>
> Key: SPARK-6775
> URL: https://issues.apache.org/jira/browse/SPARK-6775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>
> {{CataystConverter}} classes are used to convert Parquet records to Spark SQL 
> row objects. Current converter implementations have the following problems:
> # They simply ignore original Parquet schema, which makes adding Parquet 
> backwards-compatibility rules impossible.
> # They are unnecessary over complicated.
> # {{SpecificMutableRow}} is only used for structs whose fields are all of 
> primitive types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6776) Implement backwards-compatibility rules in CatalystConverters

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6776:
---

Assignee: Apache Spark  (was: Cheng Lian)

> Implement backwards-compatibility rules in CatalystConverters
> -
>
> Key: SPARK-6776
> URL: https://issues.apache.org/jira/browse/SPARK-6776
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>
> Spark SQL should also be able to read Parquet complex types represented in 
> several commonly used non-standard way. For example, legacy files written by 
> parquet-avro, parquet-thrift, and parquet-hive. We may just follow the 
> pattern used in {{AvroIndexedRecordConverter}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6776) Implement backwards-compatibility rules in CatalystConverters

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6776:
---

Assignee: Cheng Lian  (was: Apache Spark)

> Implement backwards-compatibility rules in CatalystConverters
> -
>
> Key: SPARK-6776
> URL: https://issues.apache.org/jira/browse/SPARK-6776
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Spark SQL should also be able to read Parquet complex types represented in 
> several commonly used non-standard way. For example, legacy files written by 
> parquet-avro, parquet-thrift, and parquet-hive. We may just follow the 
> pattern used in {{AvroIndexedRecordConverter}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6775) Simplify CatalystConverter class hierarchy and pass in Parquet schema

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485219#comment-14485219
 ] 

Apache Spark commented on SPARK-6775:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/5422

> Simplify CatalystConverter class hierarchy and pass in Parquet schema
> -
>
> Key: SPARK-6775
> URL: https://issues.apache.org/jira/browse/SPARK-6775
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> {{CataystConverter}} classes are used to convert Parquet records to Spark SQL 
> row objects. Current converter implementations have the following problems:
> # They simply ignore original Parquet schema, which makes adding Parquet 
> backwards-compatibility rules impossible.
> # They are unnecessary over complicated.
> # {{SpecificMutableRow}} is only used for structs whose fields are all of 
> primitive types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-1537:
---

Assignee: Apache Spark  (was: Marcelo Vanzin)

> Add integration with Yarn's Application Timeline Server
> ---
>
> Key: SPARK-1537
> URL: https://issues.apache.org/jira/browse/SPARK-1537
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
> Attachments: SPARK-1537.txt, spark-1573.patch
>
>
> It would be nice to have Spark integrate with Yarn's Application Timeline 
> Server (see YARN-321, YARN-1530). This would allow users running Spark on 
> Yarn to have a single place to go for all their history needs, and avoid 
> having to manage a separate service (Spark's built-in server).
> At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
> although there is still some ongoing work. But the basics are there, and I 
> wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485258#comment-14485258
 ] 

Apache Spark commented on SPARK-1537:
-

User 'steveloughran' has created a pull request for this issue:
https://github.com/apache/spark/pull/5423

> Add integration with Yarn's Application Timeline Server
> ---
>
> Key: SPARK-1537
> URL: https://issues.apache.org/jira/browse/SPARK-1537
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: SPARK-1537.txt, spark-1573.patch
>
>
> It would be nice to have Spark integrate with Yarn's Application Timeline 
> Server (see YARN-321, YARN-1530). This would allow users running Spark on 
> Yarn to have a single place to go for all their history needs, and avoid 
> having to manage a separate service (Spark's built-in server).
> At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
> although there is still some ongoing work. But the basics are there, and I 
> wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-1537:
---

Assignee: Marcelo Vanzin  (was: Apache Spark)

> Add integration with Yarn's Application Timeline Server
> ---
>
> Key: SPARK-1537
> URL: https://issues.apache.org/jira/browse/SPARK-1537
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: SPARK-1537.txt, spark-1573.patch
>
>
> It would be nice to have Spark integrate with Yarn's Application Timeline 
> Server (see YARN-321, YARN-1530). This would allow users running Spark on 
> Yarn to have a single place to go for all their history needs, and avoid 
> having to manage a separate service (Spark's built-in server).
> At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
> although there is still some ongoing work. But the basics are there, and I 
> wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-5114) Should Evaluator be a PipelineStage

2015-04-08 Thread Peter Rudenko (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483335#comment-14483335
 ] 

Peter Rudenko edited comment on SPARK-5114 at 4/8/15 2:14 PM:
--

+1 for should. For my use case (create pipeline from config file, sometimes 
there is need to do evaluation with several custom metrics (e.g. gini norm, 
etc.), sometimes there's no need to do evaluation, it would be done on other 
part of the system). Would be more flexible for if evaluator would be a part of 
pipeline.


was (Author: prudenko):
+1 for should. For my use case (create pipeline from config file, sometimes 
there is need to do evaluation with custom metrics (e.g. gini norm, etc.), 
sometimes there's no need to do evaluation, it would be done on other part of 
the system). Would be more flexible for if evaluator would be a part of 
pipeline.

> Should Evaluator be a PipelineStage
> ---
>
> Key: SPARK-5114
> URL: https://issues.apache.org/jira/browse/SPARK-5114
> Project: Spark
>  Issue Type: Question
>  Components: ML
>Affects Versions: 1.2.0
>Reporter: Joseph K. Bradley
>
> Pipelines can currently contain Estimators and Transformers.
> Question for debate: Should Pipelines be able to contain Evaluators?
> Pros:
> * Evaluators take input datasets with particular schema, which should perhaps 
> be checked before running a Pipeline.
> Cons:
> * Evaluators do not transform datasets.   They produce a scalar (or a few 
> values), which makes it hard to say how they fit into a Pipeline or a 
> PipelineModel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6778) SQL contexts in spark-shell and pyspark should both be called sqlContext

2015-04-08 Thread Matei Zaharia (JIRA)

Matei Zaharia created SPARK-6778:


 Summary: SQL contexts in spark-shell and pyspark should both be 
called sqlContext
 Key: SPARK-6778
 URL: https://issues.apache.org/jira/browse/SPARK-6778
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Spark Shell
Reporter: Matei Zaharia


For some reason the Python one is only called sqlCtx. This is pretty confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-5957) Better handling of default parameter values.

2015-04-08 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reassigned SPARK-5957:


Assignee: Xiangrui Meng

> Better handling of default parameter values.
> 
>
> Key: SPARK-5957
> URL: https://issues.apache.org/jira/browse/SPARK-5957
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> We store the default value of a parameter in the Param instance. In many 
> cases, the default value depends on the algorithm and other parameters 
> defined in the same algorithm. We need to think a better approach to handle 
> default parameter values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5874) How to improve the current ML pipeline API?

2015-04-08 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5874:
-
Description: 
I created this JIRA to collect feedbacks about the ML pipeline API we 
introduced in Spark 1.2. The target is to graduate this set of APIs in 1.4 with 
confidence, which requires valuable input from the community. I'll create 
sub-tasks for each major issue.

Design doc (WIP): 
https://docs.google.com/a/databricks.com/document/d/1plFBPJY_PriPTuMiFYLSm7fQgD1FieP4wt3oMVKMGcc/edit#

  was:I created this JIRA to collect feedbacks about the ML pipeline API we 
introduced in Spark 1.2. The target is to graduate this set of APIs in 1.4 with 
confidence, which requires valuable input from the community. I'll create 
sub-tasks for each major issue.


> How to improve the current ML pipeline API?
> ---
>
> Key: SPARK-5874
> URL: https://issues.apache.org/jira/browse/SPARK-5874
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Critical
>
> I created this JIRA to collect feedbacks about the ML pipeline API we 
> introduced in Spark 1.2. The target is to graduate this set of APIs in 1.4 
> with confidence, which requires valuable input from the community. I'll 
> create sub-tasks for each major issue.
> Design doc (WIP): 
> https://docs.google.com/a/databricks.com/document/d/1plFBPJY_PriPTuMiFYLSm7fQgD1FieP4wt3oMVKMGcc/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5957) Better handling of default parameter values.

2015-04-08 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5957:
-
Description: 
We store the default value of a parameter in the Param instance. In many cases, 
the default value depends on the algorithm and other parameters defined in the 
same algorithm. We need to think a better approach to handle default parameter 
values.

The design doc was posted in the parent JIRA: 
https://issues.apache.org/jira/browse/SPARK-5874

  was:We store the default value of a parameter in the Param instance. In many 
cases, the default value depends on the algorithm and other parameters defined 
in the same algorithm. We need to think a better approach to handle default 
parameter values.


> Better handling of default parameter values.
> 
>
> Key: SPARK-5957
> URL: https://issues.apache.org/jira/browse/SPARK-5957
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> We store the default value of a parameter in the Param instance. In many 
> cases, the default value depends on the algorithm and other parameters 
> defined in the same algorithm. We need to think a better approach to handle 
> default parameter values.
> The design doc was posted in the parent JIRA: 
> https://issues.apache.org/jira/browse/SPARK-5874



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6677) pyspark.sql nondeterministic issue with row fields

2015-04-08 Thread Davies Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485521#comment-14485521
 ] 

Davies Liu commented on SPARK-6677:
---

What is the expected output?

I got this:
{code}
key: a
res1 data as row: [Row(foo=1, key=u'a')]
res2 data as row: [Row(bar=3, key=u'a', other=u'foobar')]
res1 and res2 fields: (u'foo', u'key') (u'bar', u'key', u'other')
res1 data as tuple: 1 a
res2 data as tuple: 3 a foobar
key: c
res1 data as row: []
res2 data as row: [Row(bar=4, key=u'c', other=u'barfoo')]
key: b
res1 data as row: [Row(foo=2, key=u'b')]
res2 data as row: []
{code}

> pyspark.sql nondeterministic issue with row fields
> --
>
> Key: SPARK-6677
> URL: https://issues.apache.org/jira/browse/SPARK-6677
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.3.0
> Environment: spark version: spark-1.3.0-bin-hadoop2.4
> python version: Python 2.7.6
> operating system: MacOS, x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Stefano Parmesan
>  Labels: pyspark, row, sql
>
> The following issue happens only when running pyspark in the python 
> interpreter, it works correctly with spark-submit.
> Reading two json files containing objects with a different structure leads 
> sometimes to the definition of wrong Rows, where the fields of a file are 
> used for the other one.
> I was able to write a sample code that reproduce this issue one out of three 
> times; the code snippet is available at the following link, together with 
> some (very simple) data samples:
> https://gist.github.com/armisael/e08bb4567d0a11efe2db



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6779) Move shared params to param.shared and use code gen

2015-04-08 Thread Xiangrui Meng (JIRA)

Xiangrui Meng created SPARK-6779:


 Summary: Move shared params to param.shared and use code gen
 Key: SPARK-6779
 URL: https://issues.apache.org/jira/browse/SPARK-6779
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Xiangrui Meng


The boilerplate code should be automatically generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6682) Deprecate static train and use builder instead for Scala/Java

2015-04-08 Thread Alexander Ulanov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485554#comment-14485554
 ] 

Alexander Ulanov commented on SPARK-6682:
-

[~yuu.ishik...@gmail.com] 
They reside in package org.apache.spark.mllib.optimization: class LBFGS(private 
var gradient: Gradient, private var updater: Updater) and class GradientDescent 
private[mllib] (private var gradient: Gradient, private var updater: Updater). 
They extend Optimizer trait that has only one function: def optimize(data: 
RDD[(Double, Vector)], initialWeights: Vector): Vector. This function is 
limited to only one type of input: vectors and their labels. I have submitted a 
separate issue regarding this https://issues.apache.org/jira/browse/SPARK-5362. 

1. Right now static methods work with hard-coded optimizers, such as 
LogisticRegressionWithSGD. This is not very convenient. I think moving away 
from static methods and use builders implies that optimizers also could be set 
by users. It will be a problem because current optimizers require Updater and 
Gradient at the creation time. 
2. The workaround I suggested in the previous post addresses this.


> Deprecate static train and use builder instead for Scala/Java
> -
>
> Key: SPARK-6682
> URL: https://issues.apache.org/jira/browse/SPARK-6682
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>
> In MLlib, we have for some time been unofficially moving away from the old 
> static train() methods and moving towards builder patterns.  This JIRA is to 
> discuss this move and (hopefully) make it official.
> "Old static train()" API:
> {code}
> val myModel = NaiveBayes.train(myData, ...)
> {code}
> "New builder pattern" API:
> {code}
> val nb = new NaiveBayes().setLambda(0.1)
> val myModel = nb.train(myData)
> {code}
> Pros of the builder pattern:
> * Much less code when algorithms have many parameters.  Since Java does not 
> support default arguments, we required *many* duplicated static train() 
> methods (for each prefix set of arguments).
> * Helps to enforce default parameters.  Users should ideally not have to even 
> think about setting parameters if they just want to try an algorithm quickly.
> * Matches spark.ml API
> Cons of the builder pattern:
> * In Python APIs, static train methods are more "Pythonic."
> Proposal:
> * Scala/Java: We should start deprecating the old static train() methods.  We 
> must keep them for API stability, but deprecating will help with API 
> consistency, making it clear that everyone should use the builder pattern.  
> As we deprecate them, we should make sure that the builder pattern supports 
> all parameters.
> * Python: Keep static train methods.
> CC: [~mengxr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6506) python support yarn cluster mode requires SPARK_HOME to be set

2015-04-08 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-6506.
---
   Resolution: Fixed
Fix Version/s: 1.4.0
   1.3.2

Issue resolved by pull request 5405
[https://github.com/apache/spark/pull/5405]

> python support yarn cluster mode requires SPARK_HOME to be set
> --
>
> Key: SPARK-6506
> URL: https://issues.apache.org/jira/browse/SPARK-6506
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.3.0
>Reporter: Thomas Graves
> Fix For: 1.3.2, 1.4.0
>
>
> We added support for python running in yarn cluster mode in 
> https://issues.apache.org/jira/browse/SPARK-5173, but it requires that 
> SPARK_HOME be set in the environment variables for application master and 
> executor.  It doesn't have to be set to anything real but it fails if its not 
> set.  See the command at the end of: https://github.com/apache/spark/pull/3976



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6506) python support yarn cluster mode requires SPARK_HOME to be set

2015-04-08 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6506:
--
Assignee: Marcelo Vanzin

> python support yarn cluster mode requires SPARK_HOME to be set
> --
>
> Key: SPARK-6506
> URL: https://issues.apache.org/jira/browse/SPARK-6506
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.3.0
>Reporter: Thomas Graves
>Assignee: Marcelo Vanzin
> Fix For: 1.3.2, 1.4.0
>
>
> We added support for python running in yarn cluster mode in 
> https://issues.apache.org/jira/browse/SPARK-5173, but it requires that 
> SPARK_HOME be set in the environment variables for application master and 
> executor.  It doesn't have to be set to anything real but it fails if its not 
> set.  See the command at the end of: https://github.com/apache/spark/pull/3976



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6780) Add saveAsTextFileByKey method for PySpark

2015-04-08 Thread Ilya Ganelin (JIRA)

Ilya Ganelin created SPARK-6780:
---

 Summary: Add saveAsTextFileByKey method for PySpark
 Key: SPARK-6780
 URL: https://issues.apache.org/jira/browse/SPARK-6780
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Ilya Ganelin


The PySpark API should have a method to allow saving a key-value RDD to 
subdirectories organized by key as in :

https://issues.apache.org/jira/browse/SPARK-3533



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6780) Add saveAsTextFileByKey method for PySpark

2015-04-08 Thread Ilya Ganelin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485577#comment-14485577
 ] 

Ilya Ganelin commented on SPARK-6780:
-

SPARK-3533 defines matching methods for Scala and Java APIs.

> Add saveAsTextFileByKey method for PySpark
> --
>
> Key: SPARK-6780
> URL: https://issues.apache.org/jira/browse/SPARK-6780
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Ilya Ganelin
>
> The PySpark API should have a method to allow saving a key-value RDD to 
> subdirectories organized by key as in :
> https://issues.apache.org/jira/browse/SPARK-3533



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6780) Add saveAsTextFileByKey method for PySpark

2015-04-08 Thread Ilya Ganelin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485582#comment-14485582
 ] 

Ilya Ganelin commented on SPARK-6780:
-

This code was my attempt to implement this within PythonRDD.scala but I ran 
into run-time reflection issues I could not solve.

{code}
  /**
   * Output a Python RDD of key-value pairs to any Hadoop file system such that 
the values within
   * the rdd are written to sub-directories organized by the associated key.
   *
   * Keys and values are converted to suitable output types using either user 
specified converters
   * or, if not specified, 
[[org.apache.spark.api.python.JavaToWritableConverter]]. Post-conversion
   * types `keyClass` and `valueClass` are automatically inferred if not 
specified. The passed-in
   * `confAsMap` is merged with the default Hadoop conf associated with the 
SparkContext of
   * this RDD.
   */
  def saveAsHadoopFileByKey[K, V, C <: CompressionCodec](
  pyRDD: JavaRDD[Array[Byte]],
  batchSerialized: Boolean,
  path: String,
  outputFormatClass: String,
  keyClass: String,
  valueClass: String,
  keyConverterClass: String,
  valueConverterClass: String,
  confAsMap: java.util.HashMap[String, String],
  compressionCodecClass: String) = {
val rdd = SerDeUtil.pythonToPairRDD(pyRDD, batchSerialized)
val (kc, vc) = getKeyValueTypes(keyClass, valueClass).getOrElse(
  inferKeyValueTypes(rdd, keyConverterClass, valueConverterClass))
val mergedConf = getMergedConf(confAsMap, pyRDD.context.hadoopConfiguration)
val codec = 
Option(compressionCodecClass).map(Utils.classForName(_).asInstanceOf[Class[C]])
val converted = convertRDD(rdd, keyConverterClass, valueConverterClass,
  new JavaToWritableConverter)

converted.saveAsHadoopFile(path,
  ClassUtils.primitiveToWrapper(kc),
  ClassUtils.primitiveToWrapper(vc),
  classOf[RDDMultipleTextOutputFormat[K,V]],
  new JobConf(mergedConf),
  codec=codec)
  }

{code}

> Add saveAsTextFileByKey method for PySpark
> --
>
> Key: SPARK-6780
> URL: https://issues.apache.org/jira/browse/SPARK-6780
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Ilya Ganelin
>
> The PySpark API should have a method to allow saving a key-value RDD to 
> subdirectories organized by key as in :
> https://issues.apache.org/jira/browse/SPARK-3533



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6780) Add saveAsTextFileByKey method for PySpark

2015-04-08 Thread Ilya Ganelin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485586#comment-14485586
 ] 

Ilya Ganelin commented on SPARK-6780:
-

Matching test code:

{code}
 test("saveAsHadoopFileByKey should generate a text file per key") {
val testPairs : JavaRDD[Array[Byte]] = sc.parallelize(
  Seq(
Array(1.toByte,1.toByte),
Array(2.toByte,4.toByte),
Array(3.toByte,9.toByte),
Array(4.toByte,16.toByte),
Array(5.toByte,25.toByte))
).toJavaRDD()

val fs = FileSystem.get(new Configuration())
val basePath = sc.conf.get("spark.local.dir", "/tmp")
val fullPath = basePath + "/testPath"
fs.delete(new Path(fullPath), true)

PythonRDD.saveAsHadoopFileByKey(
  testPairs,
  false,
  fullPath,
  classOf[RDDMultipleTextOutputFormat].toString,
  classOf[Int].toString,
  classOf[Int].toString,
  null,
  null,
  new java.util.HashMap(), "")

// Test that a file was created for each key
(1 to 5).foreach(key => {
  val testPath = new Path(fullPath + "/" + key)
  assert(fs.exists(testPath))

  // Read the file and test that the contents are the values matching that 
key split by line
  val input = fs.open(testPath)
  val reader = new BufferedReader(new InputStreamReader(input))
  val values = new HashSet[Int]
  val lines = Stream.continually(reader.readLine()).takeWhile(_ != null)
  lines.foreach(s => values += s.toInt)

  assert(values.contains(key*key))
})

fs.delete(new Path(fullPath), true)
  }

{code}

> Add saveAsTextFileByKey method for PySpark
> --
>
> Key: SPARK-6780
> URL: https://issues.apache.org/jira/browse/SPARK-6780
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Ilya Ganelin
>
> The PySpark API should have a method to allow saving a key-value RDD to 
> subdirectories organized by key as in :
> https://issues.apache.org/jira/browse/SPARK-3533



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6753) Unit test for SPARK-3426 (in ShuffleSuite) doesn't correctly clone the SparkConf

2015-04-08 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6753:
--
Fix Version/s: 1.4.0
   1.3.2
   1.1.2
   1.2.3

> Unit test for SPARK-3426 (in ShuffleSuite) doesn't correctly clone the 
> SparkConf
> 
>
> Key: SPARK-6753
> URL: https://issues.apache.org/jira/browse/SPARK-6753
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
>Priority: Minor
> Fix For: 1.1.2, 1.3.2, 1.4.0, 1.2.3
>
>
> As a result, that test always uses the default shuffle settings, rather than 
> using the shuffle manager / other settings set by tests that extend 
> ShuffleSuite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6753) Unit test for SPARK-3426 (in ShuffleSuite) doesn't correctly clone the SparkConf

2015-04-08 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6753:
--
Affects Version/s: 1.1.1
   1.2.0

> Unit test for SPARK-3426 (in ShuffleSuite) doesn't correctly clone the 
> SparkConf
> 
>
> Key: SPARK-6753
> URL: https://issues.apache.org/jira/browse/SPARK-6753
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
>Priority: Minor
> Fix For: 1.1.2, 1.3.2, 1.4.0, 1.2.3
>
>
> As a result, that test always uses the default shuffle settings, rather than 
> using the shuffle manager / other settings set by tests that extend 
> ShuffleSuite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6753) Unit test for SPARK-3426 (in ShuffleSuite) doesn't correctly clone the SparkConf

2015-04-08 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-6753.
---
Resolution: Fixed

Fixed by https://github.com/apache/spark/pull/5401

> Unit test for SPARK-3426 (in ShuffleSuite) doesn't correctly clone the 
> SparkConf
> 
>
> Key: SPARK-6753
> URL: https://issues.apache.org/jira/browse/SPARK-6753
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.1, 1.2.0, 1.3.0
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
>Priority: Minor
> Fix For: 1.1.2, 1.3.2, 1.4.0, 1.2.3
>
>
> As a result, that test always uses the default shuffle settings, rather than 
> using the shuffle manager / other settings set by tests that extend 
> ShuffleSuite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6772) spark sql error when running code on large number of records

2015-04-08 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485708#comment-14485708
 ] 

Michael Armbrust commented on SPARK-6772:
-

Can you provide your code?  ArrayIndexOutOfBoundsException is often a bug in 
user code.

> spark sql error when running code on large number of records
> 
>
> Key: SPARK-6772
> URL: https://issues.apache.org/jira/browse/SPARK-6772
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Aditya Parmar
>
> Hi all ,
> I am getting an Arrayoutboundsindex error when i try to run a simple 
> filtering colums query on a file with 2.5 lac records.runs fine when running 
> on a file with 2k records .
> {code}
> 15/04/08 16:54:06 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 3, 
> blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
> 15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2) on 
> executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
> (null) [duplicate 1]
> 15/04/08 16:54:06 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 4, 
> blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
> 15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4) on 
> executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
> (null) [duplicate 2]
> 15/04/08 16:54:06 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 5, 
> blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
> 15/04/08 16:54:06 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 3) on 
> executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
> (null) [duplicate 3]
> 15/04/08 16:54:06 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 6, 
> blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
> 15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 5) on 
> executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
> (null) [duplicate 4]
> 15/04/08 16:54:06 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; 
> aborting job
> 15/04/08 16:54:06 INFO TaskSchedulerImpl: Cancelling stage 0
> 15/04/08 16:54:06 INFO TaskSchedulerImpl: Stage 0 was cancelled
> 15/04/08 16:54:06 INFO DAGScheduler: Job 0 failed: saveAsTextFile at 
> JavaSchemaRDD.scala:42, took 1.914477 s
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: 
> Lost task 1.3 in stage 0.0 (TID 5, blrwfl11189.igatecorp.com): 
> java.lang.ArrayIndexOutOfBoundsException
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
> at akka.dispatch.Mailbox.run(Mailbox.scala:220)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> [aditya@blrwfl11189 ~]$
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4

[jira] [Updated] (SPARK-6772) spark sql error when running code on large number of records

2015-04-08 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-6772:

Description: 
Hi all ,
I am getting an Arrayoutboundsindex error when i try to run a simple filtering 
colums query on a file with 2.5 lac records.runs fine when running on a file 
with 2k records .

{code}
15/04/08 16:54:06 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 3, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2) on 
executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
(null) [duplicate 1]
15/04/08 16:54:06 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 4, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4) on 
executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
(null) [duplicate 2]
15/04/08 16:54:06 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 5, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 3) on 
executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
(null) [duplicate 3]
15/04/08 16:54:06 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 6, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 5) on 
executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
(null) [duplicate 4]
15/04/08 16:54:06 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; 
aborting job
15/04/08 16:54:06 INFO TaskSchedulerImpl: Cancelling stage 0
15/04/08 16:54:06 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/04/08 16:54:06 INFO DAGScheduler: Job 0 failed: saveAsTextFile at 
JavaSchemaRDD.scala:42, took 1.914477 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost 
task 1.3 in stage 0.0 (TID 5, blrwfl11189.igatecorp.com): 
java.lang.ArrayIndexOutOfBoundsException

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[aditya@blrwfl11189 ~]$
 {code}

  was:
Hi all ,
I am getting an Arrayoutboundsindex error when i try to run a simple filtering 
colums query on a file with 2.5 lac records.runs fine when running on a file 
with 2k records .

15/04/08 16:54:06 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 3, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2) on 
executor blrwfl11189.igatecorp.com: java.lang.ArrayIndexOutOfBoundsException 
(null) [duplicate 1]
15/04/08 16:54:06 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 4, 
blrwfl11189.igatecorp.com, PROCESS_LOCAL, 1350 bytes)
15/04/08 16:54:06 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4) on 
exe

[jira] [Commented] (SPARK-6440) ipv6 URI for HttpServer

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485750#comment-14485750
 ] 

Apache Spark commented on SPARK-6440:
-

User 'nyaapa' has created a pull request for this issue:
https://github.com/apache/spark/pull/5424

> ipv6 URI for HttpServer
> ---
>
> Key: SPARK-6440
> URL: https://issues.apache.org/jira/browse/SPARK-6440
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
> Environment: java 7 hotspot, spark 1.3.0, ipv6 only cluster
>Reporter: Arsenii Krasikov
>Priority: Minor
>
> In {{org.apache.spark.HttpServer}} uri is generated as {code:java}"spark://" 
> + localHostname + ":" + masterPort{code}, where {{localHostname}} is 
> {code:java} org.apache.spark.util.Utils.localHostName() = 
> customHostname.getOrElse(localIpAddressHostname){code}. If the host has an 
> ipv6 address then it would be interpolated into invalid URI:  
> {{spark://fe80:0:0:0:200:f8ff:fe21:67cf:42}} instead of 
> {{spark://[fe80:0:0:0:200:f8ff:fe21:67cf]:42}}.
> The solution is to separate uri and hostname entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6781) sqlCtx -> sqlContext in pyspark shell

2015-04-08 Thread Michael Armbrust (JIRA)

Michael Armbrust created SPARK-6781:
---

 Summary: sqlCtx -> sqlContext in pyspark shell
 Key: SPARK-6781
 URL: https://issues.apache.org/jira/browse/SPARK-6781
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Davies Liu
Priority: Critical


We should be consistent across languages in the default names of things we add 
to the shells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6781) sqlCtx -> sqlContext in pyspark shell

2015-04-08 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485757#comment-14485757
 ] 

Michael Armbrust commented on SPARK-6781:
-

[~pwendell] suggests that we might alias sqlCtx so that both work to avoid a 
breaking change.  We should probably also update documentation to be consistent.

> sqlCtx -> sqlContext in pyspark shell
> -
>
> Key: SPARK-6781
> URL: https://issues.apache.org/jira/browse/SPARK-6781
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Davies Liu
>Priority: Critical
>
> We should be consistent across languages in the default names of things we 
> add to the shells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6677) pyspark.sql nondeterministic issue with row fields

2015-04-08 Thread Stefano Parmesan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485772#comment-14485772
 ] 

Stefano Parmesan commented on SPARK-6677:
-

Hi Davies,

Thanks for taking the time to look into this; that's the expected output, in 
fact. The point is that back then it happened to me randomly, once every five 
executions or so. Now I'm not able to reproduce it anymore with the data I 
posted, but I was able to reproduce it again (100% of the times) with the data 
I just added to the gist; the exception I'm getting is:

{noformat}
$ ./bin/pyspark ./spark_test.py
[...]
key: 31491
res1 data as row: [Row(foo=31491, key=u'31491')]
res2 data as row: [Row(bar=157455, key=u'31491', other=u'foobar', 
some=u'thing', that=u'this', this=u'that')]
res1 and res2 fields: (u'foo', u'key') (u'bar', u'key', u'other', u'some', 
u'that', u'this')
res1 data as tuple: 31491 31491
res2 data as tuple: 157455 31491 foobar
key: 31497
res1 data as row: []
res2 data as row: [Row(foo=157485, key=u'31497')]
key: 31495
res1 data as row: [Traceback (most recent call last):
  File "/path/to/spark-1.3.0-bin-hadoop2.4/./spark_test.py", line 25, in 

print "res1 data as row:", list(res_x)
  File "/path/to/spark-1.3.0-bin-hadoop2.4/python/pyspark/sql/types.py", line 
1214, in __repr__
for n in self.__FIELDS__))
  File "/path/to/spark-1.3.0-bin-hadoop2.4/python/pyspark/sql/types.py", line 
1214, in 
for n in self.__FIELDS__))
IndexError: tuple index out of range
{noformat}
which is the same I'm getting in my more complex script.

Some considerations:
1) it may be that this particular input leads to the issue only on my machine, 
therefore I've added another file to generate some random inputs; I've got this 
exception on two over three randomly-generated samples, please go ahead and run 
it with different values of {{N}} if the uploaded data does not make pyspark 
crash in your environment;
2) interestingly, given the sample data, it always crashes on the same key: 
{{31495}}; however, they do not seem "magic" to me (and of course input files 
containing just those two elements does not make pyspark crash in any way):
{noformat}
data/sample_a.json:{"foo": 31495, "key": "31495"}
data/sample_b.json:{"other": "foobar", "bar": 157475, "key": "31495", 
"that": "this", "this": "that", "some": "thing"}
{noformat}
3) what happens is that either {{res_x.data\[0\].___FIELDS___}} or 
{{res_y.data\[0\].___FIELDS___}} get the wrong field names, leading to the 
{{IndexError}} (the fields are too many and the row does not contain enough 
data).

> pyspark.sql nondeterministic issue with row fields
> --
>
> Key: SPARK-6677
> URL: https://issues.apache.org/jira/browse/SPARK-6677
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.3.0
> Environment: spark version: spark-1.3.0-bin-hadoop2.4
> python version: Python 2.7.6
> operating system: MacOS, x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Stefano Parmesan
>  Labels: pyspark, row, sql
>
> The following issue happens only when running pyspark in the python 
> interpreter, it works correctly with spark-submit.
> Reading two json files containing objects with a different structure leads 
> sometimes to the definition of wrong Rows, where the fields of a file are 
> used for the other one.
> I was able to write a sample code that reproduce this issue one out of three 
> times; the code snippet is available at the following link, together with 
> some (very simple) data samples:
> https://gist.github.com/armisael/e08bb4567d0a11efe2db



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6677) pyspark.sql nondeterministic issue with row fields

2015-04-08 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-6677:

Assignee: Davies Liu

> pyspark.sql nondeterministic issue with row fields
> --
>
> Key: SPARK-6677
> URL: https://issues.apache.org/jira/browse/SPARK-6677
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.3.0
> Environment: spark version: spark-1.3.0-bin-hadoop2.4
> python version: Python 2.7.6
> operating system: MacOS, x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Stefano Parmesan
>Assignee: Davies Liu
>  Labels: pyspark, row, sql
>
> The following issue happens only when running pyspark in the python 
> interpreter, it works correctly with spark-submit.
> Reading two json files containing objects with a different structure leads 
> sometimes to the definition of wrong Rows, where the fields of a file are 
> used for the other one.
> I was able to write a sample code that reproduce this issue one out of three 
> times; the code snippet is available at the following link, together with 
> some (very simple) data samples:
> https://gist.github.com/armisael/e08bb4567d0a11efe2db



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5594) SparkException: Failed to get broadcast (TorrentBroadcast)

2015-04-08 Thread Zach Fry (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485780#comment-14485780
 ] 

Zach Fry commented on SPARK-5594:
-

[~pwendell], 

We saw this while running a long Spark Job (>24 hours) with a 
{{spark.cleaner.ttl}} set to 24 hours. 
IE: {{spark.cleaner.ttl}} triggers a clean before the job is finished. 

We were able to repro this pretty easily by setting the {{spark.cleaner.ttl}} 
to 60 seconds and running a small job. 
{code}
java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_2_piece0 of broadcast_2
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1003)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:138)
{code}

We were wondering what the purpose of {{spark.cleaner.ttl}} is.
Why does it need to be set to begin with (if at all)? 
Won't the broadcast variables be cleaned up when the job ends and the context 
is shutdown anyway?


> SparkException: Failed to get broadcast (TorrentBroadcast)
> --
>
> Key: SPARK-5594
> URL: https://issues.apache.org/jira/browse/SPARK-5594
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0, 1.3.0
>Reporter: John Sandiford
>Priority: Critical
>
> I am uncertain whether this is a bug, however I am getting the error below 
> when running on a cluster (works locally), and have no idea what is causing 
> it, or where to look for more information.
> Any help is appreciated.  Others appear to experience the same issue, but I 
> have not found any solutions online.
> Please note that this only happens with certain code and is repeatable, all 
> my other spark jobs work fine.
> ERROR TaskSetManager: Task 3 in stage 6.0 failed 4 times; aborting job
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 3 in stage 6.0 failed 4 times, most recent failure: 
> Lost task 3.3 in stage 6.0 (TID 24, ): java.io.IOException: 
> org.apache.spark.SparkException: Failed to get broadcast_6_piece0 of 
> broadcast_6
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_6_piece0 
> of broadcast_6
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
> at 
> org.apache.spark

[jira] [Commented] (SPARK-6781) sqlCtx -> sqlContext in pyspark shell

2015-04-08 Thread Davies Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485788#comment-14485788
 ] 

Davies Liu commented on SPARK-6781:
---

We use `sqlCtx` a lot in doc tests (will be in API docs), should we also change 
them?

> sqlCtx -> sqlContext in pyspark shell
> -
>
> Key: SPARK-6781
> URL: https://issues.apache.org/jira/browse/SPARK-6781
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Davies Liu
>Priority: Critical
>
> We should be consistent across languages in the default names of things we 
> add to the shells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6781) sqlCtx -> sqlContext in pyspark shell

2015-04-08 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485801#comment-14485801
 ] 

Michael Armbrust commented on SPARK-6781:
-

Yeah, I think it would be better to be consistent everywhere.

> sqlCtx -> sqlContext in pyspark shell
> -
>
> Key: SPARK-6781
> URL: https://issues.apache.org/jira/browse/SPARK-6781
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Davies Liu
>Priority: Critical
>
> We should be consistent across languages in the default names of things we 
> add to the shells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6781) sqlCtx -> sqlContext in pyspark shell

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6781:
---

Assignee: Apache Spark  (was: Davies Liu)

> sqlCtx -> sqlContext in pyspark shell
> -
>
> Key: SPARK-6781
> URL: https://issues.apache.org/jira/browse/SPARK-6781
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Apache Spark
>Priority: Critical
>
> We should be consistent across languages in the default names of things we 
> add to the shells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6781) sqlCtx -> sqlContext in pyspark shell

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6781:
---

Assignee: Davies Liu  (was: Apache Spark)

> sqlCtx -> sqlContext in pyspark shell
> -
>
> Key: SPARK-6781
> URL: https://issues.apache.org/jira/browse/SPARK-6781
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Davies Liu
>Priority: Critical
>
> We should be consistent across languages in the default names of things we 
> add to the shells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6781) sqlCtx -> sqlContext in pyspark shell

2015-04-08 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485803#comment-14485803
 ] 

Apache Spark commented on SPARK-6781:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/5425

> sqlCtx -> sqlContext in pyspark shell
> -
>
> Key: SPARK-6781
> URL: https://issues.apache.org/jira/browse/SPARK-6781
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Davies Liu
>Priority: Critical
>
> We should be consistent across languages in the default names of things we 
> add to the shells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6757) spark.sql.shuffle.partitions is global, not per connection

2015-04-08 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-6757.
-
Resolution: Duplicate

> spark.sql.shuffle.partitions is global, not per connection
> --
>
> Key: SPARK-6757
> URL: https://issues.apache.org/jira/browse/SPARK-6757
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: David Ross
>
> We are trying to use the {{spark.sql.shuffle.partitions}} parameter to handle 
> large queries differently from smaller queries. We expected that this 
> parameter would be respected per connection, but it seems to be global.
> For example, in try this in two separate JDBC connections:
> Connection 1:
> {code}
> SET spark.sql.shuffle.partitions=10;
> SELECT * FROM some_table;
> {code}
> The correct number {{10}} was used.
> Connection 2:
> {code}
> SET spark.sql.shuffle.partitions=100;
> SELECT * FROM some_table;
> {code}
> The correct number {{100}} was used.
> Back to connection 1:
> {code}
> SELECT * FROM some_table;
> {code}
> We expected the number {{10}} to be used but {{100}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5594) SparkException: Failed to get broadcast (TorrentBroadcast)

2015-04-08 Thread Andrew Ash (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ash updated SPARK-5594:
--
Description: 
I am uncertain whether this is a bug, however I am getting the error below when 
running on a cluster (works locally), and have no idea what is causing it, or 
where to look for more information.
Any help is appreciated.  Others appear to experience the same issue, but I 
have not found any solutions online.

Please note that this only happens with certain code and is repeatable, all my 
other spark jobs work fine.

{noformat}
ERROR TaskSetManager: Task 3 in stage 6.0 failed 4 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 3 in stage 6.0 failed 4 times, most recent failure: Lost 
task 3.3 in stage 6.0 (TID 24, ): java.io.IOException: 
org.apache.spark.SparkException: Failed to get broadcast_6_piece0 of broadcast_6
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_6_piece0 of 
broadcast_6
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008)
... 11 more
{noformat}

Driver stacktrace:
{noformat}
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.fork

[jira] [Created] (SPARK-6782) add sbt-revolver plugin to sbt build

2015-04-08 Thread Imran Rashid (JIRA)

Imran Rashid created SPARK-6782:
---

 Summary: add sbt-revolver plugin to sbt build
 Key: SPARK-6782
 URL: https://issues.apache.org/jira/browse/SPARK-6782
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Imran Rashid
Assignee: Imran Rashid
Priority: Minor


[sbt-revolver|https://github.com/spray/sbt-revolver] is a very useful sbt 
plugin for development.  You can start & stop long-running processes without 
being forced to kill the entire sbt session.  This can save a lot of time in 
the development cycle.

With sbt-revolver, you run {{re-start}} to start your app in a forked jvm.   It 
immediately gives you the sbt shell back, so you can continue to code.  When 
you want to reload your app with whatever changes you make, you just run 
{{re-start}} again -- it will kill the forked jvm, recompile your code, and 
start the process again.  (Or you can run {{re-stop}} at any time to kill the 
forked jvm.)

I used this a ton while working on adding json support to the UI in 
https://issues.apache.org/jira/browse/SPARK-3454 (as the history server never 
stops, without this plugin I had to kill sbt between every time I'd run it 
manually to play with the behavior.)  I don't write a lot of spark-streaming 
jobs, but I've also used this plugin in that case, since again my streaming 
jobs never terminate -- I imagine it would be really useful to anybody that is 
modifying streaming and wants to test out running some jobs.

I'll post a PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6705) MLLIB ML Pipeline's Logistic Regression has no intercept term

2015-04-08 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-6705:
-
Assignee: Omede Firouz

> MLLIB ML Pipeline's Logistic Regression has no intercept term
> -
>
> Key: SPARK-6705
> URL: https://issues.apache.org/jira/browse/SPARK-6705
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Omede Firouz
>Assignee: Omede Firouz
> Fix For: 1.4.0
>
>
> Currently, the ML Pipeline's LogisticRegression.scala file does not allow 
> setting whether or not to fit an intercept term. Therefore, the pipeline 
> defers to LogisticRegressionWithLBFGS which does not use an intercept term. 
> This makes sense from a performance point of view because adding an intercept 
> term requires memory allocation.
> However, this is undesirable statistically, since the statistical default is 
> usually to include an intercept term, and one needs to have a very strong
> reason for not having an intercept term.
> Explicitly modeling the intercept by adding a column of all 1s does not
> work because LogisticRegressionWithLBFGS forces column normalization, and a 
> column of all 1s has 0 variance and so dividing by 0 kills it.
> We should open up the API for the ML Pipeline to explicitly allow controlling 
> whether or not to fit an intercept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6782) add sbt-revolver plugin to sbt build

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6782:
---

Assignee: Apache Spark  (was: Imran Rashid)

> add sbt-revolver plugin to sbt build
> 
>
> Key: SPARK-6782
> URL: https://issues.apache.org/jira/browse/SPARK-6782
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Imran Rashid
>Assignee: Apache Spark
>Priority: Minor
>
> [sbt-revolver|https://github.com/spray/sbt-revolver] is a very useful sbt 
> plugin for development.  You can start & stop long-running processes without 
> being forced to kill the entire sbt session.  This can save a lot of time in 
> the development cycle.
> With sbt-revolver, you run {{re-start}} to start your app in a forked jvm.   
> It immediately gives you the sbt shell back, so you can continue to code.  
> When you want to reload your app with whatever changes you make, you just run 
> {{re-start}} again -- it will kill the forked jvm, recompile your code, and 
> start the process again.  (Or you can run {{re-stop}} at any time to kill the 
> forked jvm.)
> I used this a ton while working on adding json support to the UI in 
> https://issues.apache.org/jira/browse/SPARK-3454 (as the history server never 
> stops, without this plugin I had to kill sbt between every time I'd run it 
> manually to play with the behavior.)  I don't write a lot of spark-streaming 
> jobs, but I've also used this plugin in that case, since again my streaming 
> jobs never terminate -- I imagine it would be really useful to anybody that 
> is modifying streaming and wants to test out running some jobs.
> I'll post a PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6782) add sbt-revolver plugin to sbt build

2015-04-08 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6782:
---

Assignee: Imran Rashid  (was: Apache Spark)

> add sbt-revolver plugin to sbt build
> 
>
> Key: SPARK-6782
> URL: https://issues.apache.org/jira/browse/SPARK-6782
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Imran Rashid
>Assignee: Imran Rashid
>Priority: Minor
>
> [sbt-revolver|https://github.com/spray/sbt-revolver] is a very useful sbt 
> plugin for development.  You can start & stop long-running processes without 
> being forced to kill the entire sbt session.  This can save a lot of time in 
> the development cycle.
> With sbt-revolver, you run {{re-start}} to start your app in a forked jvm.   
> It immediately gives you the sbt shell back, so you can continue to code.  
> When you want to reload your app with whatever changes you make, you just run 
> {{re-start}} again -- it will kill the forked jvm, recompile your code, and 
> start the process again.  (Or you can run {{re-stop}} at any time to kill the 
> forked jvm.)
> I used this a ton while working on adding json support to the UI in 
> https://issues.apache.org/jira/browse/SPARK-3454 (as the history server never 
> stops, without this plugin I had to kill sbt between every time I'd run it 
> manually to play with the behavior.)  I don't write a lot of spark-streaming 
> jobs, but I've also used this plugin in that case, since again my streaming 
> jobs never terminate -- I imagine it would be really useful to anybody that 
> is modifying streaming and wants to test out running some jobs.
> I'll post a PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 >

1 - 100 of 209 matches

Mail list logo