[jira] [Comment Edited] (SPARK-19226) Report failure reason from Reporter Thread

2017-02-12 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862684#comment-15862684
 ] 

Yuming Wang edited comment on SPARK-19226 at 2/13/17 2:38 AM:
--

Try to increase ApplicationMaster's Java heap {{spark.yarn.am.memory=2G}} or 
set maxExecutors to a reasonable value 
{{spark.dynamicAllocation.maxExecutors=400}}.


was (Author: q79969786):
Try to increase ApplicationMaster's Java heap {{-- conf 
spark.yarn.am.memory=2G}}, default is 512m.

> Report failure reason from Reporter Thread 
> ---
>
> Key: SPARK-19226
> URL: https://issues.apache.org/jira/browse/SPARK-19226
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.0.2
> Environment: emr-5.2.1 with Zeppelin 0.6.2/Spark2.0.2 and 10 r3.xl 
> core nodes
>Reporter: Maheedhar Reddy Chappidi
>Priority: Minor
>
> With the exponential[1] increase in executor count the Reporter thread [2] 
> fails without proper message.
> ==
> 17/01/12 09:33:44 INFO YarnAllocator: Driver requested a total number of 
> 32767 executor(s).
> 17/01/12 09:33:44 INFO YarnAllocator: Will request 24576 executor containers, 
> each with 2 cores and 5632 MB memory including 512 MB overhead
> 17/01/12 09:33:44 INFO YarnAllocator: Canceled 0 container requests (locality 
> no longer needed)
> 17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 
> 34419 executor(s).
> 17/01/12 09:33:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 
> 12, (reason: Exception was thrown 1 time(s) from Reporter thread.)
> 17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 
> 34410 executor(s).
> 17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 
> 34409 executor(s).
> 17/01/12 09:33:52 INFO ShutdownHookManager: Shutdown hook called
> ==
> We were able to run the workflows by setting/limiting the maxExecutor count 
> (spark.dynamicAllocation.maxExecutors) to avoid more requests(35k->65k).
> Added I don't see any issues with ApplicationMaster's container 
> memory/compute.
> Is it possible to parse more ErrorReason from if/else?
> [1]  
> https://github.com/apache/spark/blob/6ee28423ad1b2e6089b82af64a31d77d3552bb38/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
> [2] 
> https://github.com/apache/spark/blob/01e14bf303e61a5726f3b1418357a50c1bf8b16f/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L446-L480



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18699) Spark CSV parsing types other than String throws exception when malformed

2017-02-12 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863122#comment-15863122
 ] 

Takeshi Yamamuro commented on SPARK-18699:
--

This fix makes some sensible to me and what do you think? cc: [~hyukjin.kwon]
If yes, I try to make a pr based on this 
(https://github.com/apache/spark/compare/master...maropu:SPARK-18699). If no, I 
think it's okay to set "Won't Fix".

> Spark CSV parsing types other than String throws exception when malformed
> -
>
> Key: SPARK-18699
> URL: https://issues.apache.org/jira/browse/SPARK-18699
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Jakub Nowacki
>
> If CSV is read and the schema contains any other type than String, exception 
> is thrown when the string value in CSV is malformed; e.g. if the timestamp 
> does not match the defined one, an exception is thrown:
> {code}
> Caused by: java.lang.IllegalArgumentException
>   at java.sql.Date.valueOf(Date.java:143)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:137)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$$anonfun$castTo$6.apply$mcJ$sp(CSVInferSchema.scala:272)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$$anonfun$castTo$6.apply(CSVInferSchema.scala:272)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$$anonfun$castTo$6.apply(CSVInferSchema.scala:272)
>   at scala.util.Try.getOrElse(Try.scala:79)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:269)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:116)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:85)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:128)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:127)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:253)
>   at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
>   at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1348)
>   at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258)
>   ... 8 more
> {code}
> It behaves similarly with Integer and Long types, from what I've seen.
> To my understanding modes PERMISSIVE and DROPMALFORMED should just null the 
> value or drop the line, but instead they kill the job.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19569) could not get APP ID and cause failed to connect to spark driver on yarn-client mode

2017-02-12 Thread KaiXu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXu updated SPARK-19569:
--
Description: 
when I run Hive queries on Spark, got below error in the console, after check 
the container's log, found it failed to connected to spark driver. I have set  
hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
submitted after 3601s', actually during this long-time period it's impossible 
no available resource, and also did not see any issue related to the network, 
so the cause is not clear from the message "Possible reasons include network 
issues, errors in remote driver or the cluster has no available resources, 
etc.".
>From Hive's log, failed to get APP ID, so this might be the cause why the 
>driver did not start up.

console log:
Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
Job hasn't been submitted after 3601s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster 
has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask

container's log:

17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
 } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: PRIVATE, 
__spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" } 
size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
appattempt_1486905599813_0046_02
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set(); users  with modify permissions: Set(root); groups 
with modify permissions: Set()
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
reachable.
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect t

[jira] [Updated] (SPARK-19569) could not get APP ID and cause failed to connect to spark driver on yarn-client mode

2017-02-12 Thread KaiXu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXu updated SPARK-19569:
--
Description: 
when I run Hive queries on Spark, got below error in the console, after check 
the container's log, found it failed to connected to spark driver. I have set  
hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
submitted after 3601s', actually during this long-time period it's impossible 
no available resource, and also did not see any issue related to the network, 
so the cause is not clear from the message "Possible reasons include network 
issues, errors in remote driver or the cluster has no available resources, 
etc.".
>From Hive's log, get a TimeoutException, failed to get APP ID, so this might 
>be the cause why the driver did not start up.

console log:
Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
Job hasn't been submitted after 3601s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster 
has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask

container's log:

17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
 } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: PRIVATE, 
__spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" } 
size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
appattempt_1486905599813_0046_02
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set(); users  with modify permissions: Set(root); groups 
with modify permissions: Set()
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
reachable.
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMas

[jira] [Updated] (SPARK-19569) could not get APP ID and failed to connect to spark driver on yarn-client mode

2017-02-12 Thread KaiXu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXu updated SPARK-19569:
--
Summary: could not  get APP ID and failed to connect to spark driver on 
yarn-client mode  (was: could not  connect to spark driver on yarn-client mode)

> could not  get APP ID and failed to connect to spark driver on yarn-client 
> mode
> ---
>
> Key: SPARK-19569
> URL: https://issues.apache.org/jira/browse/SPARK-19569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
> Environment: hadoop2.7.1
> spark2.0.2
> hive2.2
>Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check 
> the container's log, found it failed to connected to spark driver. I have set 
>  hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
> submitted after 3601s', actually during this long-time period it's impossible 
> no available resource, and also did not see any issue related to the network, 
> so the cause is not clear from the message "Possible reasons include network 
> issues, errors in remote driver or the cluster has no available resources, 
> etc.".
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the 
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
> file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
>  } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: 
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 
> 8020 file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" 
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
> appattempt_1486905599813_0046_02
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(root); groups 
> with view permissions: Set(); users  with modify permissions: Set(root); 
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> a

[jira] [Updated] (SPARK-19569) could not get APP ID and cause failed to connect to spark driver on yarn-client mode

2017-02-12 Thread KaiXu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXu updated SPARK-19569:
--
Summary: could not  get APP ID and cause failed to connect to spark driver 
on yarn-client mode  (was: could not  get APP ID and failed to connect to spark 
driver on yarn-client mode)

> could not  get APP ID and cause failed to connect to spark driver on 
> yarn-client mode
> -
>
> Key: SPARK-19569
> URL: https://issues.apache.org/jira/browse/SPARK-19569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
> Environment: hadoop2.7.1
> spark2.0.2
> hive2.2
>Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check 
> the container's log, found it failed to connected to spark driver. I have set 
>  hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
> submitted after 3601s', actually during this long-time period it's impossible 
> no available resource, and also did not see any issue related to the network, 
> so the cause is not clear from the message "Possible reasons include network 
> issues, errors in remote driver or the cluster has no available resources, 
> etc.".
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the 
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
> file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
>  } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: 
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 
> 8020 file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" 
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
> appattempt_1486905599813_0046_02
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(root); groups 
> with view permissions: Set(); users  with modify permissions: Set(root); 
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.Applica

[jira] [Updated] (SPARK-19569) could not connect to spark driver on yarn-client mode

2017-02-12 Thread KaiXu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXu updated SPARK-19569:
--
Description: 
when I run Hive queries on Spark, got below error in the console, after check 
the container's log, found it failed to connected to spark driver. I have set  
hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
submitted after 3601s', actually during this long-time period it's impossible 
no available resource, and also did not see any issue related to the network, 
so the cause is not clear from the message "Possible reasons include network 
issues, errors in remote driver or the cluster has no available resources, 
etc.".

console log:
Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
Job hasn't been submitted after 3601s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster 
has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask

container's log:

17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
 } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: PRIVATE, 
__spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" } 
size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
appattempt_1486905599813_0046_02
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set(); users  with modify permissions: Set(root); groups 
with modify permissions: Set()
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
reachable.
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to

[jira] [Commented] (SPARK-19570) Allow to disable hive in pyspark shell

2017-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863115#comment-15863115
 ] 

Apache Spark commented on SPARK-19570:
--

User 'zjffdu' has created a pull request for this issue:
https://github.com/apache/spark/pull/16906

> Allow to disable hive in pyspark shell
> --
>
> Key: SPARK-19570
> URL: https://issues.apache.org/jira/browse/SPARK-19570
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Jeff Zhang
>Priority: Minor
>
> SPARK-15236 do this for scala shell, this ticket is for pyspark shell.  This 
> is not only for pyspark itself, but can also benefit downstream project like 
> livy which use shell.py for its interactive session. For now, livy has no 
> control of whether enable hive or not. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19570) Allow to disable hive in pyspark shell

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19570:


Assignee: (was: Apache Spark)

> Allow to disable hive in pyspark shell
> --
>
> Key: SPARK-19570
> URL: https://issues.apache.org/jira/browse/SPARK-19570
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Jeff Zhang
>Priority: Minor
>
> SPARK-15236 do this for scala shell, this ticket is for pyspark shell.  This 
> is not only for pyspark itself, but can also benefit downstream project like 
> livy which use shell.py for its interactive session. For now, livy has no 
> control of whether enable hive or not. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19570) Allow to disable hive in pyspark shell

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19570:


Assignee: Apache Spark

> Allow to disable hive in pyspark shell
> --
>
> Key: SPARK-19570
> URL: https://issues.apache.org/jira/browse/SPARK-19570
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Jeff Zhang
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-15236 do this for scala shell, this ticket is for pyspark shell.  This 
> is not only for pyspark itself, but can also benefit downstream project like 
> livy which use shell.py for its interactive session. For now, livy has no 
> control of whether enable hive or not. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19570) Allow to disable hive in pyspark shell

2017-02-12 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated SPARK-19570:
---
Description: SPARK-15236 do this for scala shell, this ticket is for 
pyspark shell.  This is not only for pyspark itself, but can also benefit 
downstream project like livy which use shell.py for its interactive session. 
For now, livy has no control of whether enable hive or not.   (was: SPARK-15236 
do this for scala shell, this ticket is for pyspark shell.  This is not only 
for pyspark itself, but can also benefit downstream project like livy which use 
shell.py for its interactive session.)

> Allow to disable hive in pyspark shell
> --
>
> Key: SPARK-19570
> URL: https://issues.apache.org/jira/browse/SPARK-19570
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Jeff Zhang
>Priority: Minor
>
> SPARK-15236 do this for scala shell, this ticket is for pyspark shell.  This 
> is not only for pyspark itself, but can also benefit downstream project like 
> livy which use shell.py for its interactive session. For now, livy has no 
> control of whether enable hive or not. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19570) Allow to disable hive in pyspark shell

2017-02-12 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated SPARK-19570:
---
Description: SPARK-15236 do this for scala shell, this ticket is for 
pyspark shell.  This is not only for pyspark itself, but can also benefit 
downstream project like livy which use shell.py for its interactive session.  
(was: SPARK-15236 do this for scala shell, this ticket is for pyspark shell. )

> Allow to disable hive in pyspark shell
> --
>
> Key: SPARK-19570
> URL: https://issues.apache.org/jira/browse/SPARK-19570
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Jeff Zhang
>Priority: Minor
>
> SPARK-15236 do this for scala shell, this ticket is for pyspark shell.  This 
> is not only for pyspark itself, but can also benefit downstream project like 
> livy which use shell.py for its interactive session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19570) Allow to disable hive in pyspark shell

2017-02-12 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated SPARK-19570:
---
Description: SPARK-15236 do this for scala shell, this ticket is for 
pyspark shell. 

> Allow to disable hive in pyspark shell
> --
>
> Key: SPARK-19570
> URL: https://issues.apache.org/jira/browse/SPARK-19570
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Jeff Zhang
>Priority: Minor
>
> SPARK-15236 do this for scala shell, this ticket is for pyspark shell. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19570) Allow to disable hive in pyspark shell

2017-02-12 Thread Jeff Zhang (JIRA)
Jeff Zhang created SPARK-19570:
--

 Summary: Allow to disable hive in pyspark shell
 Key: SPARK-19570
 URL: https://issues.apache.org/jira/browse/SPARK-19570
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 2.1.0
Reporter: Jeff Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19569) could not connect to spark driver on yarn-client mode

2017-02-12 Thread KaiXu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863109#comment-15863109
 ] 

KaiXu commented on SPARK-19569:
---

it's not the IP address resolution issue (SPARK-5113), since 192.168.1.1 is the 
client node(yarn-client the driver node) .

> could not  connect to spark driver on yarn-client mode
> --
>
> Key: SPARK-19569
> URL: https://issues.apache.org/jira/browse/SPARK-19569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
> Environment: hadoop2.7.1
> spark2.0.2
> hive2.2
>Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check 
> the container's log, found it failed to connected to spark driver. I have set 
>  hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
> submitted after 3601s', actually during this long-time period it's impossible 
> no available resource, and also did not see any issue related to the network, 
> so the cause is not clear from the message "Possible reasons include network 
> issues, errors in remote driver or the cluster has no available resources, 
> etc.".
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the 
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
> file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
>  } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: 
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 
> 8020 file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" 
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
> appattempt_1486905599813_0046_02
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(root); groups 
> with view permissions: Set(); users  with modify permissions: Set(root); 
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retryin

[jira] [Updated] (SPARK-19569) could not connect to spark driver on yarn-client mode

2017-02-12 Thread KaiXu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXu updated SPARK-19569:
--
Description: 
when I run Hive queries on Spark, got below error in the console, after check 
the container's log, found it failed to connected to spark driver. I have set  
hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
submitted after 3601s', actually during this long-time period it's impossible 
no available resource, and also did not see any issue related to the network, 
so the cause is not clear from the message "Possible reasons include network 
issues, errors in remote driver or the cluster has no available resources, 
etc.".

console log:
Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
Job hasn't been submitted after 3601s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster 
has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask


container's log:

17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
 } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: PRIVATE, 
__spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" } 
size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
appattempt_1486905599813_0046_02
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set(); users  with modify permissions: Set(root); groups 
with modify permissions: Set()
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
reachable.
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed t

[jira] [Created] (SPARK-19569) could not connect to spark driver on yarn-client mode

2017-02-12 Thread KaiXu (JIRA)
KaiXu created SPARK-19569:
-

 Summary: could not  connect to spark driver on yarn-client mode
 Key: SPARK-19569
 URL: https://issues.apache.org/jira/browse/SPARK-19569
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.2
 Environment: hadoop2.7.1
spark2.0.2
hive2.2

Reporter: KaiXu


when I run Hive queries on Spark, got below error in the console, after check 
the container's log, found it failed to connected to spark driver. I have set  
hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
submitted after 3601s', actually during this long-time period it's impossible 
no available resource, and also did not see any issue related to the network, 
so the cause is not clear from the message "Possible reasons include network 
issues, errors in remote driver or the cluster has no available resources, 
etc.".

Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
Job hasn't been submitted after 3601s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster 
has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask


container's log:

17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
 } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: PRIVATE, 
__spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 file: 
"/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" } 
size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
appattempt_1486905599813_0046_02
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set(); users  with modify permissions: Set(root); groups 
with modify permissions: Set()
17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
reachable.
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
192.168.1.1:43656, retrying ...
17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed

[jira] [Updated] (SPARK-19541) High Availability support for ThriftServer

2017-02-12 Thread LvDeShui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LvDeShui updated SPARK-19541:
-
Affects Version/s: (was: 1.3.1)
   2.0.2
   2.1.0

> High Availability support for ThriftServer
> --
>
> Key: SPARK-19541
> URL: https://issues.apache.org/jira/browse/SPARK-19541
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: LvDeShui
>
> Currently, We use the spark ThriftServer frequently, and there are many 
> connects between the client and only ThriftServer.When the ThriftServer is 
> down ,we cannot get the service again until the server is restarted .So we 
> need to consider the ThriftServer HA as well as HiveServer HA. For 
> ThriftServer, we want to import the pattern of HiveServer HA to provide 
> ThriftServer HA. Therefore, we need to start multiple thrift server which 
> register on the zookeeper. Then the client  can find the thrift server by 
> just connecting to the zookeeper.So the beeline can get the service from 
> other thrift server when one is down.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19568) Must include class/method documentation for CRAN check

2017-02-12 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863085#comment-15863085
 ] 

Felix Cheung commented on SPARK-19568:
--

I checked both master and branch-2.1 - it should still be done as a part of the 
make-distribution.sh call chain - we should double check the next time we make 
a release. Unfortunately it looks like --r is not done with nightly builds so 
not sure we could confirm for now - it would be good to kick off a release 
build to verify (run R CMD check --as-cran)

> Must include class/method documentation for CRAN check
> --
>
> Key: SPARK-19568
> URL: https://issues.apache.org/jira/browse/SPARK-19568
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> While tests are running, R CMD check --as-cran is still complaining
> {code}
> * checking for missing documentation entries ... WARNING
> Undocumented code objects:
>   ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’ ‘arrange’
>   ‘array_contains’ ‘as.DataFrame’ ‘as.data.frame’ ‘asc’ ‘ascii’ ‘avg’
>   ‘base64’ ‘between’ ‘bin’ ‘bitwiseNOT’ ‘bround’ ‘cache’ ‘cacheTable’
>   ‘cancelJobGroup’ ‘cast’ ‘cbrt’ ‘ceil’ ‘clearCache’ ‘clearJobGroup’
>   ‘collect’ ‘colnames’ ‘colnames<-’ ‘coltypes’ ‘coltypes<-’ ‘column’
>   ‘columns’ ‘concat’ ‘concat_ws’ ‘contains’ ‘conv’ ‘corr’ ‘count’
>   ‘countDistinct’ ‘cov’ ‘covar_pop’ ‘covar_samp’ ‘crc32’
>   ‘createDataFrame’ ‘createExternalTable’ ‘createOrReplaceTempView’
>   ‘crossJoin’ ‘crosstab’ ‘cume_dist’ ‘dapply’ ‘dapplyCollect’
>   ‘date_add’ ‘date_format’ ‘date_sub’ ‘datediff’ ‘dayofmonth’
>   ‘dayofyear’ ‘decode’ ‘dense_rank’ ‘desc’ ‘describe’ ‘distinct’ ‘drop’
> ...
> {code}
> This is because of lack of .Rd files in a clean environment when running 
> against the content of the R source package.
> I think we need to generate the .Rd files under man\ when building the 
> release and then package with them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19568) Must include class/method documentation for CRAN check

2017-02-12 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-19568:


 Summary: Must include class/method documentation for CRAN check
 Key: SPARK-19568
 URL: https://issues.apache.org/jira/browse/SPARK-19568
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 2.1.0
Reporter: Felix Cheung
Assignee: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19568) Must include class/method documentation for CRAN check

2017-02-12 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-19568:
-
Description: 
While tests are running, R CMD check --as-cran is still complaining

{code}
* checking for missing documentation entries ... WARNING
Undocumented code objects:
  ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’ ‘arrange’
  ‘array_contains’ ‘as.DataFrame’ ‘as.data.frame’ ‘asc’ ‘ascii’ ‘avg’
  ‘base64’ ‘between’ ‘bin’ ‘bitwiseNOT’ ‘bround’ ‘cache’ ‘cacheTable’
  ‘cancelJobGroup’ ‘cast’ ‘cbrt’ ‘ceil’ ‘clearCache’ ‘clearJobGroup’
  ‘collect’ ‘colnames’ ‘colnames<-’ ‘coltypes’ ‘coltypes<-’ ‘column’
  ‘columns’ ‘concat’ ‘concat_ws’ ‘contains’ ‘conv’ ‘corr’ ‘count’
  ‘countDistinct’ ‘cov’ ‘covar_pop’ ‘covar_samp’ ‘crc32’
  ‘createDataFrame’ ‘createExternalTable’ ‘createOrReplaceTempView’
  ‘crossJoin’ ‘crosstab’ ‘cume_dist’ ‘dapply’ ‘dapplyCollect’
  ‘date_add’ ‘date_format’ ‘date_sub’ ‘datediff’ ‘dayofmonth’
  ‘dayofyear’ ‘decode’ ‘dense_rank’ ‘desc’ ‘describe’ ‘distinct’ ‘drop’
...
{code}

This is because of lack of .Rd files in a clean environment when running 
against the content of the R source package.
I think we need to generate the .Rd files under man\ when building the release 
and then package with them.

> Must include class/method documentation for CRAN check
> --
>
> Key: SPARK-19568
> URL: https://issues.apache.org/jira/browse/SPARK-19568
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> While tests are running, R CMD check --as-cran is still complaining
> {code}
> * checking for missing documentation entries ... WARNING
> Undocumented code objects:
>   ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’ ‘arrange’
>   ‘array_contains’ ‘as.DataFrame’ ‘as.data.frame’ ‘asc’ ‘ascii’ ‘avg’
>   ‘base64’ ‘between’ ‘bin’ ‘bitwiseNOT’ ‘bround’ ‘cache’ ‘cacheTable’
>   ‘cancelJobGroup’ ‘cast’ ‘cbrt’ ‘ceil’ ‘clearCache’ ‘clearJobGroup’
>   ‘collect’ ‘colnames’ ‘colnames<-’ ‘coltypes’ ‘coltypes<-’ ‘column’
>   ‘columns’ ‘concat’ ‘concat_ws’ ‘contains’ ‘conv’ ‘corr’ ‘count’
>   ‘countDistinct’ ‘cov’ ‘covar_pop’ ‘covar_samp’ ‘crc32’
>   ‘createDataFrame’ ‘createExternalTable’ ‘createOrReplaceTempView’
>   ‘crossJoin’ ‘crosstab’ ‘cume_dist’ ‘dapply’ ‘dapplyCollect’
>   ‘date_add’ ‘date_format’ ‘date_sub’ ‘datediff’ ‘dayofmonth’
>   ‘dayofyear’ ‘decode’ ‘dense_rank’ ‘desc’ ‘describe’ ‘distinct’ ‘drop’
> ...
> {code}
> This is because of lack of .Rd files in a clean environment when running 
> against the content of the R source package.
> I think we need to generate the .Rd files under man\ when building the 
> release and then package with them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19387) CRAN tests do not run with SparkR source package

2017-02-12 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-19387:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> CRAN tests do not run with SparkR source package
> 
>
> Key: SPARK-19387
> URL: https://issues.apache.org/jira/browse/SPARK-19387
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> It looks like sparkR.session() is not installing Spark - as a result, running 
> R CMD check --as-cran SparkR_*.tar.gz fails, blocking possible submission to 
> CRAN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-19566) Error initializing SparkContext under a Windows SYSTEM user

2017-02-12 Thread boddie (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

boddie closed SPARK-19566.
--
Resolution: Invalid

> Error initializing SparkContext under a Windows SYSTEM user
> ---
>
> Key: SPARK-19566
> URL: https://issues.apache.org/jira/browse/SPARK-19566
> Project: Spark
>  Issue Type: Bug
>  Components: Windows
>Affects Versions: 2.1.0
>Reporter: boddie
>
> We use a SparkLauncher class in our application which is running in a 
> WebSphere Application Server (it is started as a service). When we try to 
> submit an application to Spark (running in standalone mode without Hadoop) , 
> we get this error:
> ERROR SparkContext: Error initializing SparkContext.
> Exception in thread "main" java.io.IOException: failure to login
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:824)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>   at org.apache.spark.SparkContext.addFile(SparkContext.scala:1452)
>   at org.apache.spark.SparkContext.addFile(SparkContext.scala:1425)
>   at 
> org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
>   at 
> org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.apache.spark.SparkContext.(SparkContext.scala:470)
>   at org.apache.spark.SparkContext.(SparkContext.scala:117)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:53)
>   at 
> com.ibm.el.expertise.spark.MatrixCompletionRunner.main(MatrixCompletionRunner.java:46)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
>   at java.lang.reflect.Method.invoke(Method.java:620)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: javax.security.auth.login.LoginException: 
> java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 3
>   at com.ibm.security.auth.module.Win64System.getCurrent(Native Method)
>   at com.ibm.security.auth.module.Win64System.(Win64System.java:74)
>   at 
> com.ibm.security.auth.module.Win64LoginModule.login(Win64LoginModule.java:143)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
>   at java.lang.reflect.Method.invoke(Method.java:620)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:781)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:215)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:706)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:704)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:456)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:703)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:609)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:799)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
>   at org.apache.hadoop.fs.FileSystem.get(Fil

[jira] [Assigned] (SPARK-19567) Support some Schedulable variables immutability and access

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19567:


Assignee: Apache Spark

> Support some Schedulable variables immutability and access
> --
>
> Key: SPARK-19567
> URL: https://issues.apache.org/jira/browse/SPARK-19567
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: Eren Avsarogullari
>Assignee: Apache Spark
>Priority: Minor
>
> Support some Schedulable variables immutability and access
> Some Schedulable variables need refactoring for immutability and access 
> modifiers as follows:
> - from vars to vals(if there is no requirement): This is important to support 
> immutability as much as possible. 
> Sample => Pool: weight, minShare, priority, name and 
> taskSetSchedulingAlgorithm.
> - access modifiers: Specially, vars access needs to be restricted from other 
> parts of codebase to prevent potential side effects. Sample: 
> Sample => TaskSetManager: tasksSuccessful, totalResultSize, calculatedTasks 
> etc...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19567) Support some Schedulable variables immutability and access

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19567:


Assignee: (was: Apache Spark)

> Support some Schedulable variables immutability and access
> --
>
> Key: SPARK-19567
> URL: https://issues.apache.org/jira/browse/SPARK-19567
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: Eren Avsarogullari
>Priority: Minor
>
> Support some Schedulable variables immutability and access
> Some Schedulable variables need refactoring for immutability and access 
> modifiers as follows:
> - from vars to vals(if there is no requirement): This is important to support 
> immutability as much as possible. 
> Sample => Pool: weight, minShare, priority, name and 
> taskSetSchedulingAlgorithm.
> - access modifiers: Specially, vars access needs to be restricted from other 
> parts of codebase to prevent potential side effects. Sample: 
> Sample => TaskSetManager: tasksSuccessful, totalResultSize, calculatedTasks 
> etc...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19567) Support some Schedulable variables immutability and access

2017-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863005#comment-15863005
 ] 

Apache Spark commented on SPARK-19567:
--

User 'erenavsarogullari' has created a pull request for this issue:
https://github.com/apache/spark/pull/16905

> Support some Schedulable variables immutability and access
> --
>
> Key: SPARK-19567
> URL: https://issues.apache.org/jira/browse/SPARK-19567
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: Eren Avsarogullari
>Priority: Minor
>
> Support some Schedulable variables immutability and access
> Some Schedulable variables need refactoring for immutability and access 
> modifiers as follows:
> - from vars to vals(if there is no requirement): This is important to support 
> immutability as much as possible. 
> Sample => Pool: weight, minShare, priority, name and 
> taskSetSchedulingAlgorithm.
> - access modifiers: Specially, vars access needs to be restricted from other 
> parts of codebase to prevent potential side effects. Sample: 
> Sample => TaskSetManager: tasksSuccessful, totalResultSize, calculatedTasks 
> etc...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19567) Support some Schedulable variables immutability and access

2017-02-12 Thread Eren Avsarogullari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862983#comment-15862983
 ] 

Eren Avsarogullari commented on SPARK-19567:


Hi [~srowen],

Thanks for quick response.

I prepared the following patch for this and it is related with 
https://github.com/apache/spark/pull/15604 
https://github.com/erenavsarogullari/spark/commit/d158b789c98923d5989f9fd50fd7fd3a4f1fc1ff

Also, i totally agree about public APIs and suggested patch is at the scheduler 
implementation level(private[spark] Schedulable Entities and TaskSchedulerImpl) 
so not the part of public APIs.

> Support some Schedulable variables immutability and access
> --
>
> Key: SPARK-19567
> URL: https://issues.apache.org/jira/browse/SPARK-19567
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: Eren Avsarogullari
>Priority: Minor
>
> Support some Schedulable variables immutability and access
> Some Schedulable variables need refactoring for immutability and access 
> modifiers as follows:
> - from vars to vals(if there is no requirement): This is important to support 
> immutability as much as possible. 
> Sample => Pool: weight, minShare, priority, name and 
> taskSetSchedulingAlgorithm.
> - access modifiers: Specially, vars access needs to be restricted from other 
> parts of codebase to prevent potential side effects. Sample: 
> Sample => TaskSetManager: tasksSuccessful, totalResultSize, calculatedTasks 
> etc...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19566) Error initializing SparkContext under a Windows SYSTEM user

2017-02-12 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862958#comment-15862958
 ] 

Sean Owen commented on SPARK-19566:
---

Those are errors from a Websphere library and/or Hadoop, not Spark.

> Error initializing SparkContext under a Windows SYSTEM user
> ---
>
> Key: SPARK-19566
> URL: https://issues.apache.org/jira/browse/SPARK-19566
> Project: Spark
>  Issue Type: Bug
>  Components: Windows
>Affects Versions: 2.1.0
>Reporter: boddie
>
> We use a SparkLauncher class in our application which is running in a 
> WebSphere Application Server (it is started as a service). When we try to 
> submit an application to Spark (running in standalone mode without Hadoop) , 
> we get this error:
> ERROR SparkContext: Error initializing SparkContext.
> Exception in thread "main" java.io.IOException: failure to login
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:824)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>   at org.apache.spark.SparkContext.addFile(SparkContext.scala:1452)
>   at org.apache.spark.SparkContext.addFile(SparkContext.scala:1425)
>   at 
> org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
>   at 
> org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.apache.spark.SparkContext.(SparkContext.scala:470)
>   at org.apache.spark.SparkContext.(SparkContext.scala:117)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:53)
>   at 
> com.ibm.el.expertise.spark.MatrixCompletionRunner.main(MatrixCompletionRunner.java:46)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
>   at java.lang.reflect.Method.invoke(Method.java:620)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: javax.security.auth.login.LoginException: 
> java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 3
>   at com.ibm.security.auth.module.Win64System.getCurrent(Native Method)
>   at com.ibm.security.auth.module.Win64System.(Win64System.java:74)
>   at 
> com.ibm.security.auth.module.Win64LoginModule.login(Win64LoginModule.java:143)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
>   at java.lang.reflect.Method.invoke(Method.java:620)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:781)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:215)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:706)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:704)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:456)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:703)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:609)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:799)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
>   at org.

[jira] [Resolved] (SPARK-7101) Spark SQL should support java.sql.Time

2017-02-12 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-7101.
--
Resolution: Not A Problem

> Spark SQL should support java.sql.Time
> --
>
> Key: SPARK-7101
> URL: https://issues.apache.org/jira/browse/SPARK-7101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.2.1
> Environment: All
>Reporter: Peter Hagelund
>Priority: Minor
>
> Several RDBMSes support the TIME data type; for more exact mapping between 
> those and Spark SQL, support for java.sql.Time with an associated 
> DataType.TimeType would be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19567) Support some Schedulable variables immutability and access

2017-02-12 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862956#comment-15862956
 ] 

Sean Owen commented on SPARK-19567:
---

There are probably a thousand improvements like this that we could make. If 
there's even a modest argument that it would improve correctness, readability, 
speed, etc. I think you could go ahead and fix up a subsection of the code. The 
constraint is that we couldn't really change public APIs. Is that the case 
here? 

> Support some Schedulable variables immutability and access
> --
>
> Key: SPARK-19567
> URL: https://issues.apache.org/jira/browse/SPARK-19567
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: Eren Avsarogullari
>Priority: Minor
>
> Support some Schedulable variables immutability and access
> Some Schedulable variables need refactoring for immutability and access 
> modifiers as follows:
> - from vars to vals(if there is no requirement): This is important to support 
> immutability as much as possible. 
> Sample => Pool: weight, minShare, priority, name and 
> taskSetSchedulingAlgorithm.
> - access modifiers: Specially, vars access needs to be restricted from other 
> parts of codebase to prevent potential side effects. Sample: 
> Sample => TaskSetManager: tasksSuccessful, totalResultSize, calculatedTasks 
> etc...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862954#comment-15862954
 ] 

Armin Braun commented on SPARK-19562:
-

PR added https://github.com/apache/spark/pull/16904 

> Gitignore Misses Folder dev/pr-deps
> ---
>
> Key: SPARK-19562
> URL: https://issues.apache.org/jira/browse/SPARK-19562
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Trivial
>
> It's basically in the title.
> Running the build and tests as instructed by the Readme creates the folder 
> `dev/pr-deps` that is not covered by the gitignore leaving us with this:
> {code:none}
> ➜  spark git:(master) ✗ git status
>   
> On branch master
> Your branch is up-to-date with 'origin/master'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   dev/pr-deps/
> {code}
> I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862953#comment-15862953
 ] 

Apache Spark commented on SPARK-19562:
--

User 'original-brownbear' has created a pull request for this issue:
https://github.com/apache/spark/pull/16904

> Gitignore Misses Folder dev/pr-deps
> ---
>
> Key: SPARK-19562
> URL: https://issues.apache.org/jira/browse/SPARK-19562
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Trivial
>
> It's basically in the title.
> Running the build and tests as instructed by the Readme creates the folder 
> `dev/pr-deps` that is not covered by the gitignore leaving us with this:
> {code:none}
> ➜  spark git:(master) ✗ git status
>   
> On branch master
> Your branch is up-to-date with 'origin/master'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   dev/pr-deps/
> {code}
> I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19562:


Assignee: Apache Spark

> Gitignore Misses Folder dev/pr-deps
> ---
>
> Key: SPARK-19562
> URL: https://issues.apache.org/jira/browse/SPARK-19562
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Assignee: Apache Spark
>Priority: Trivial
>
> It's basically in the title.
> Running the build and tests as instructed by the Readme creates the folder 
> `dev/pr-deps` that is not covered by the gitignore leaving us with this:
> {code:none}
> ➜  spark git:(master) ✗ git status
>   
> On branch master
> Your branch is up-to-date with 'origin/master'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   dev/pr-deps/
> {code}
> I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19562:


Assignee: (was: Apache Spark)

> Gitignore Misses Folder dev/pr-deps
> ---
>
> Key: SPARK-19562
> URL: https://issues.apache.org/jira/browse/SPARK-19562
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Trivial
>
> It's basically in the title.
> Running the build and tests as instructed by the Readme creates the folder 
> `dev/pr-deps` that is not covered by the gitignore leaving us with this:
> {code:none}
> ➜  spark git:(master) ✗ git status
>   
> On branch master
> Your branch is up-to-date with 'origin/master'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   dev/pr-deps/
> {code}
> I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19567) Support some Schedulable variables immutability and access

2017-02-12 Thread Eren Avsarogullari (JIRA)
Eren Avsarogullari created SPARK-19567:
--

 Summary: Support some Schedulable variables immutability and access
 Key: SPARK-19567
 URL: https://issues.apache.org/jira/browse/SPARK-19567
 Project: Spark
  Issue Type: Improvement
  Components: Scheduler
Affects Versions: 2.1.0
Reporter: Eren Avsarogullari
Priority: Minor


Support some Schedulable variables immutability and access

Some Schedulable variables need refactoring for immutability and access 
modifiers as follows:
- from vars to vals(if there is no requirement): This is important to support 
immutability as much as possible. 
Sample => Pool: weight, minShare, priority, name and taskSetSchedulingAlgorithm.
- access modifiers: Specially, vars access needs to be restricted from other 
parts of codebase to prevent potential side effects. Sample: 
Sample => TaskSetManager: tasksSuccessful, totalResultSize, calculatedTasks 
etc...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6628) ClassCastException occurs when executing sql statement "insert into" on hbase table

2017-02-12 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6628.
--
Resolution: Not A Problem

> ClassCastException occurs when executing sql statement "insert into" on hbase 
> table
> ---
>
> Key: SPARK-6628
> URL: https://issues.apache.org/jira/browse/SPARK-6628
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: meiyoula
>
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 3.0 (TID 12, vm-17): java.lang.ClassCastException: 
> org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to 
> org.apache.hadoop.hive.ql.io.HiveOutputFormat
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:72)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:71)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:91)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:115)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:84)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:112)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862947#comment-15862947
 ] 

Sean Owen commented on SPARK-19562:
---

That's fine, make a PR?

> Gitignore Misses Folder dev/pr-deps
> ---
>
> Key: SPARK-19562
> URL: https://issues.apache.org/jira/browse/SPARK-19562
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Trivial
>
> It's basically in the title.
> Running the build and tests as instructed by the Readme creates the folder 
> `dev/pr-deps` that is not covered by the gitignore leaving us with this:
> {code:none}
> ➜  spark git:(master) ✗ git status
>   
> On branch master
> Your branch is up-to-date with 'origin/master'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   dev/pr-deps/
> {code}
> I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19566) Error initializing SparkContext under a Windows SYSTEM user

2017-02-12 Thread Petr Bouska (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Bouska updated SPARK-19566:

Description: 
We use a SparkLauncher class in our application which is running in a WebSphere 
Application Server (it is started as a service). When we try to submit an 
application to Spark (running in standalone mode without Hadoop) , we get this 
error:
ERROR SparkContext: Error initializing SparkContext.
Exception in thread "main" java.io.IOException: failure to login
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:824)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1452)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1425)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.SparkContext.(SparkContext.scala:470)
at org.apache.spark.SparkContext.(SparkContext.scala:117)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:53)
at 
com.ibm.el.expertise.spark.MatrixCompletionRunner.main(MatrixCompletionRunner.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
at java.lang.reflect.Method.invoke(Method.java:620)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.security.auth.login.LoginException: 
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 3
at com.ibm.security.auth.module.Win64System.getCurrent(Native Method)
at com.ibm.security.auth.module.Win64System.(Win64System.java:74)
at 
com.ibm.security.auth.module.Win64LoginModule.login(Win64LoginModule.java:143)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
at java.lang.reflect.Method.invoke(Method.java:620)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:781)
at 
javax.security.auth.login.LoginContext.access$000(LoginContext.java:215)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:706)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:704)
at 
java.security.AccessController.doPrivileged(AccessController.java:456)
at 
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:703)
at javax.security.auth.login.LoginContext.login(LoginContext.java:609)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:799)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1452)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1425)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at scala.collection.immutable.Lis

[jira] [Updated] (SPARK-19566) Error initializing SparkContext under a Windows SYSTEM user

2017-02-12 Thread Petr Bouska (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Bouska updated SPARK-19566:

Description: 
We use a SparkLauncher class in our application which is running in a WebSphere 
Application Server (it is started as a service). When we try to submit an 
application to the Spark (running in standalone mode without Hadoop) , we get 
this error:
ERROR SparkContext: Error initializing SparkContext.
Exception in thread "main" java.io.IOException: failure to login
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:824)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1452)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1425)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.SparkContext.(SparkContext.scala:470)
at org.apache.spark.SparkContext.(SparkContext.scala:117)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:53)
at 
com.ibm.el.expertise.spark.MatrixCompletionRunner.main(MatrixCompletionRunner.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
at java.lang.reflect.Method.invoke(Method.java:620)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.security.auth.login.LoginException: 
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 3
at com.ibm.security.auth.module.Win64System.getCurrent(Native Method)
at com.ibm.security.auth.module.Win64System.(Win64System.java:74)
at 
com.ibm.security.auth.module.Win64LoginModule.login(Win64LoginModule.java:143)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
at java.lang.reflect.Method.invoke(Method.java:620)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:781)
at 
javax.security.auth.login.LoginContext.access$000(LoginContext.java:215)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:706)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:704)
at 
java.security.AccessController.doPrivileged(AccessController.java:456)
at 
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:703)
at javax.security.auth.login.LoginContext.login(LoginContext.java:609)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:799)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1452)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1425)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at scala.collection.immutable

[jira] [Created] (SPARK-19566) Error initializing SparkContext under a Windows SYSTEM user

2017-02-12 Thread Petr Bouska (JIRA)
Petr Bouska created SPARK-19566:
---

 Summary: Error initializing SparkContext under a Windows SYSTEM 
user
 Key: SPARK-19566
 URL: https://issues.apache.org/jira/browse/SPARK-19566
 Project: Spark
  Issue Type: Bug
  Components: Windows
Affects Versions: 2.1.0
Reporter: Petr Bouska


We use a SparkLauncher class in our application which is running in a WebSphere 
Application Server (it is started as a service). When we try to submit an 
application to Spark, we get this error:
ERROR SparkContext: Error initializing SparkContext.
Exception in thread "main" java.io.IOException: failure to login
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:824)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1452)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1425)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:470)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.SparkContext.(SparkContext.scala:470)
at org.apache.spark.SparkContext.(SparkContext.scala:117)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:53)
at 
com.ibm.el.expertise.spark.MatrixCompletionRunner.main(MatrixCompletionRunner.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
at java.lang.reflect.Method.invoke(Method.java:620)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.security.auth.login.LoginException: 
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 3
at com.ibm.security.auth.module.Win64System.getCurrent(Native Method)
at com.ibm.security.auth.module.Win64System.(Win64System.java:74)
at 
com.ibm.security.auth.module.Win64LoginModule.login(Win64LoginModule.java:143)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
at java.lang.reflect.Method.invoke(Method.java:620)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:781)
at 
javax.security.auth.login.LoginContext.access$000(LoginContext.java:215)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:706)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:704)
at 
java.security.AccessController.doPrivileged(AccessController.java:456)
at 
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:703)
at javax.security.auth.login.LoginContext.login(LoginContext.java:609)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:799)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2828)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2818)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1452)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1425)
at 
org.apache.spark.SparkContext$$anonfun$12.apply(S

[jira] [Updated] (SPARK-19319) SparkR Kmeans summary returns error when the cluster size doesn't equal to k

2017-02-12 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-19319:
-
Target Version/s: 2.1.1, 2.2.0  (was: 2.2.0)

> SparkR Kmeans summary returns error when the cluster size doesn't equal to k
> 
>
> Key: SPARK-19319
> URL: https://issues.apache.org/jira/browse/SPARK-19319
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SparkR
>Reporter: Miao Wang
>Assignee: Miao Wang
> Fix For: 2.1.1, 2.2.0
>
>
> When Kmeans using initMode = "random" and some random seed, it is possible 
> the actual cluster size doesn't equal to the configured `k`.
> In this case, summary(model) returns error due to the number of cols of 
> coefficient matrix doesn't equal to k.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19319) SparkR Kmeans summary returns error when the cluster size doesn't equal to k

2017-02-12 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-19319:
-
Fix Version/s: 2.1.1

> SparkR Kmeans summary returns error when the cluster size doesn't equal to k
> 
>
> Key: SPARK-19319
> URL: https://issues.apache.org/jira/browse/SPARK-19319
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SparkR
>Reporter: Miao Wang
>Assignee: Miao Wang
> Fix For: 2.1.1, 2.2.0
>
>
> When Kmeans using initMode = "random" and some random seed, it is possible 
> the actual cluster size doesn't equal to the configured `k`.
> In this case, summary(model) returns error due to the number of cols of 
> coefficient matrix doesn't equal to k.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19342) Datatype tImestamp is converted to numeric in collect method

2017-02-12 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-19342:
-
Target Version/s: 2.1.1, 2.2.0
   Fix Version/s: 2.2.0
  2.1.1

> Datatype tImestamp is converted to numeric in collect method 
> -
>
> Key: SPARK-19342
> URL: https://issues.apache.org/jira/browse/SPARK-19342
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Fangzhou Yang
> Fix For: 2.1.1, 2.2.0
>
>
> Get double instead of POSIX in collect method for timestamp column datatype, 
> when NA exists at the top of the column.
> The following codes and outputs show that, how the bug can be reproduced:
> {code}
> > sparkR.session(master = "local")
> Spark package found in SPARK_HOME: /home/titicaca/spark-2.1
> Launching java with spark-submit command 
> /home/titicaca/spark-2.1/bin/spark-submit   sparkr-shell 
> /tmp/RtmpqmpZUg/backend_port363a898be92 
> Java ref type org.apache.spark.sql.SparkSession id 1 
> > df <- data.frame(col1 = c(0, 1, 2), 
> +  col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, 
> as.POSIXct("2017-01-01 12:00:01")))
> > sdf1 <- createDataFrame(df)
> > print(dtypes(sdf1))
> [[1]]
> [1] "col1"   "double"
> [[2]]
> [1] "col2"  "timestamp"
> > df1 <- collect(sdf1)
> > print(lapply(df1, class))
> $col1
> [1] "numeric"
> $col2
> [1] "POSIXct" "POSIXt" 
> > sdf2 <- filter(sdf1, "col1 > 0")
> > print(dtypes(sdf2))
> [[1]]
> [1] "col1"   "double"
> [[2]]
> [1] "col2"  "timestamp"
> > df2 <- collect(sdf2)
> > print(lapply(df2, class))
> $col1
> [1] "numeric"
> $col2
> [1] "numeric"
> {code}
> As we can see, the data type of col2 is converted to numberic unexpectedly in 
> the collected local data frame df2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19564) KafkaOffsetReader's consumers should not be in the same group

2017-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862847#comment-15862847
 ] 

Apache Spark commented on SPARK-19564:
--

User 'lw-lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/16900

> KafkaOffsetReader's consumers should not be in the same group
> -
>
> Key: SPARK-19564
> URL: https://issues.apache.org/jira/browse/SPARK-19564
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Liwei Lin
>Priority: Minor
>
> In `KafkaOffsetReader`, when error occurs, we abort the existing consumer and 
> create a new consumer. In our current implementation, the first consumer and 
> the second consumer would be in the same group, which violates our intention 
> of the two consumers not being in the same group.
> The cause is that, in our current implementation, the first consumer is 
> created before `groupId` and `nextId` are initialized in the constructor. 
> Then even if `groupId` and `nextId` are increased during the creation of that 
> first consumer, `groupId` and `nextId` would still be initialized to default 
> values in the constructor.
> We should make sure that `groupId` and `nextId` are initialized before any 
> consumer is created.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19559) Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19559:


Assignee: Apache Spark

> Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions
> 
>
> Key: SPARK-19559
> URL: https://issues.apache.org/jira/browse/SPARK-19559
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.1.0
>Reporter: Kay Ousterhout
>Assignee: Apache Spark
>
> This test has started failing frequently recently; e.g., 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72720/testReport/junit/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
>  and 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72725/testReport/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
> cc [~zsxwing] and [~tcondie] who seemed to have modified the related code 
> most recently



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19559) Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions

2017-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862817#comment-15862817
 ] 

Apache Spark commented on SPARK-19559:
--

User 'lw-lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/16902

> Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions
> 
>
> Key: SPARK-19559
> URL: https://issues.apache.org/jira/browse/SPARK-19559
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.1.0
>Reporter: Kay Ousterhout
>
> This test has started failing frequently recently; e.g., 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72720/testReport/junit/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
>  and 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72725/testReport/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
> cc [~zsxwing] and [~tcondie] who seemed to have modified the related code 
> most recently



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19559) Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19559:


Assignee: (was: Apache Spark)

> Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions
> 
>
> Key: SPARK-19559
> URL: https://issues.apache.org/jira/browse/SPARK-19559
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.1.0
>Reporter: Kay Ousterhout
>
> This test has started failing frequently recently; e.g., 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72720/testReport/junit/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
>  and 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72725/testReport/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
> cc [~zsxwing] and [~tcondie] who seemed to have modified the related code 
> most recently



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19564) KafkaOffsetReader's consumers should not be in the same group

2017-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862805#comment-15862805
 ] 

Apache Spark commented on SPARK-19564:
--

User 'lw-lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/16902

> KafkaOffsetReader's consumers should not be in the same group
> -
>
> Key: SPARK-19564
> URL: https://issues.apache.org/jira/browse/SPARK-19564
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Liwei Lin
>Priority: Minor
>
> In `KafkaOffsetReader`, when error occurs, we abort the existing consumer and 
> create a new consumer. In our current implementation, the first consumer and 
> the second consumer would be in the same group, which violates our intention 
> of the two consumers not being in the same group.
> The cause is that, in our current implementation, the first consumer is 
> created before `groupId` and `nextId` are initialized in the constructor. 
> Then even if `groupId` and `nextId` are increased during the creation of that 
> first consumer, `groupId` and `nextId` would still be initialized to default 
> values in the constructor.
> We should make sure that `groupId` and `nextId` are initialized before any 
> consumer is created.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19565) After fetching failed, success of old attempt of stage should be taken as valid.

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19565:


Assignee: Apache Spark

> After fetching failed, success of old attempt of stage should be taken as 
> valid.
> 
>
> Key: SPARK-19565
> URL: https://issues.apache.org/jira/browse/SPARK-19565
> Project: Spark
>  Issue Type: Test
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: jin xing
>Assignee: Apache Spark
>
> This is related to SPARK-19263. 
> When fetch failed, stage will be resubmitted. There can be running tasks from 
> both old and new stage attempts. Success of tasks from old stage attempt 
> should be taken as valid and partitionId should be removed from stage's 
> pendingPartitions accordingly. When pending partitions is empty, downstream 
> stage can be scheduled, even though there's still running tasks in the 
> active(new) stage attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19565) After fetching failed, success of old attempt of stage should be taken as valid.

2017-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862793#comment-15862793
 ] 

Apache Spark commented on SPARK-19565:
--

User 'jinxing64' has created a pull request for this issue:
https://github.com/apache/spark/pull/16901

> After fetching failed, success of old attempt of stage should be taken as 
> valid.
> 
>
> Key: SPARK-19565
> URL: https://issues.apache.org/jira/browse/SPARK-19565
> Project: Spark
>  Issue Type: Test
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: jin xing
>
> This is related to SPARK-19263. 
> When fetch failed, stage will be resubmitted. There can be running tasks from 
> both old and new stage attempts. Success of tasks from old stage attempt 
> should be taken as valid and partitionId should be removed from stage's 
> pendingPartitions accordingly. When pending partitions is empty, downstream 
> stage can be scheduled, even though there's still running tasks in the 
> active(new) stage attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19565) After fetching failed, success of old attempt of stage should be taken as valid.

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19565:


Assignee: (was: Apache Spark)

> After fetching failed, success of old attempt of stage should be taken as 
> valid.
> 
>
> Key: SPARK-19565
> URL: https://issues.apache.org/jira/browse/SPARK-19565
> Project: Spark
>  Issue Type: Test
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: jin xing
>
> This is related to SPARK-19263. 
> When fetch failed, stage will be resubmitted. There can be running tasks from 
> both old and new stage attempts. Success of tasks from old stage attempt 
> should be taken as valid and partitionId should be removed from stage's 
> pendingPartitions accordingly. When pending partitions is empty, downstream 
> stage can be scheduled, even though there's still running tasks in the 
> active(new) stage attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19565) After fetching failed, success of old attempt of stage should be taken as valid.

2017-02-12 Thread jin xing (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jin xing updated SPARK-19565:
-
Description: 
This is related to SPARK-19263. 
When fetch failed, stage will be resubmitted. There can be running tasks from 
both old and new stage attempts. Success of tasks from old stage attempt should 
be taken as valid and partitionId should be removed from stage's 
pendingPartitions accordingly. When pending partitions is empty, downstream 
stage can be scheduled, even though there's still running tasks in the 
active(new) stage attempt.

> After fetching failed, success of old attempt of stage should be taken as 
> valid.
> 
>
> Key: SPARK-19565
> URL: https://issues.apache.org/jira/browse/SPARK-19565
> Project: Spark
>  Issue Type: Test
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: jin xing
>
> This is related to SPARK-19263. 
> When fetch failed, stage will be resubmitted. There can be running tasks from 
> both old and new stage attempts. Success of tasks from old stage attempt 
> should be taken as valid and partitionId should be removed from stage's 
> pendingPartitions accordingly. When pending partitions is empty, downstream 
> stage can be scheduled, even though there's still running tasks in the 
> active(new) stage attempt.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19565) After fetching failed, success of old attempt of stage should be taken as valid.

2017-02-12 Thread jin xing (JIRA)
jin xing created SPARK-19565:


 Summary: After fetching failed, success of old attempt of stage 
should be taken as valid.
 Key: SPARK-19565
 URL: https://issues.apache.org/jira/browse/SPARK-19565
 Project: Spark
  Issue Type: Test
  Components: Scheduler
Affects Versions: 2.1.0
Reporter: jin xing






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19558) Provide a config option to attach QueryExecutionListener to SparkSession

2017-02-12 Thread Song Jun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862768#comment-15862768
 ] 

Song Jun commented on SPARK-19558:
--

sparkSession.listenerManager.register is not enough?

> Provide a config option to attach QueryExecutionListener to SparkSession
> 
>
> Key: SPARK-19558
> URL: https://issues.apache.org/jira/browse/SPARK-19558
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Salil Surendran
>
> Provide a configuration property(just like spark.extraListeners) to attach a 
> QueryExecutionListener to a SparkSession



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19564) KafkaOffsetReader's consumers should not be in the same group

2017-02-12 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated SPARK-19564:
--
Description: 
In `KafkaOffsetReader`, when error occurs, we abort the existing consumer and 
create a new consumer. In our current implementation, the first consumer and 
the second consumer would be in the same group, which violates our intention of 
the two consumers not being in the same group.

The cause is that, in our current implementation, the first consumer is created 
before `groupId` and `nextId` are initialized in the constructor. Then even if 
`groupId` and `nextId` are increased during the creation of that first 
consumer, `groupId` and `nextId` would still be initialized to default values 
in the constructor.

We should make sure that `groupId` and `nextId` are initialized before any 
consumer is created.

  was:
In KafkaOffsetReader, when error occurs, we abort the existing consumer and 
create a new consumer. In our current implementation, the first consumer and 
the second consumer would be in the same group, which violates our intention of 
the two consumers not being in the same group.

The cause is that, in our current implementation, the first consumer is created 
before `groupId` and `nextId` are initialized in the constructor. Then even if 
`groupId` and `nextId` are increased during the creation of that first 
consumer, `groupId` and `nextId` would still be initialized to default values 
in the constructor.

We should make sure that `groupId` and `nextId` are initialized before the 
creation of the any consumer.


> KafkaOffsetReader's consumers should not be in the same group
> -
>
> Key: SPARK-19564
> URL: https://issues.apache.org/jira/browse/SPARK-19564
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Liwei Lin
>Priority: Minor
>
> In `KafkaOffsetReader`, when error occurs, we abort the existing consumer and 
> create a new consumer. In our current implementation, the first consumer and 
> the second consumer would be in the same group, which violates our intention 
> of the two consumers not being in the same group.
> The cause is that, in our current implementation, the first consumer is 
> created before `groupId` and `nextId` are initialized in the constructor. 
> Then even if `groupId` and `nextId` are increased during the creation of that 
> first consumer, `groupId` and `nextId` would still be initialized to default 
> values in the constructor.
> We should make sure that `groupId` and `nextId` are initialized before any 
> consumer is created.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19564) KafkaOffsetReader's consumers should not be in the same group

2017-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862751#comment-15862751
 ] 

Apache Spark commented on SPARK-19564:
--

User 'lw-lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/16900

> KafkaOffsetReader's consumers should not be in the same group
> -
>
> Key: SPARK-19564
> URL: https://issues.apache.org/jira/browse/SPARK-19564
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Liwei Lin
>Priority: Minor
>
> In KafkaOffsetReader, when error occurs, we abort the existing consumer and 
> create a new consumer. In our current implementation, the first consumer and 
> the second consumer would be in the same group, which violates our intention 
> of the two consumers not being in the same group.
> The cause is that, in our current implementation, the first consumer is 
> created before `groupId` and `nextId` are initialized in the constructor. 
> Then even if `groupId` and `nextId` are increased during the creation of that 
> first consumer, `groupId` and `nextId` would still be initialized to default 
> values in the constructor.
> We should make sure that `groupId` and `nextId` are initialized before the 
> creation of the any consumer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19564) KafkaOffsetReader's consumers should not be in the same group

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19564:


Assignee: (was: Apache Spark)

> KafkaOffsetReader's consumers should not be in the same group
> -
>
> Key: SPARK-19564
> URL: https://issues.apache.org/jira/browse/SPARK-19564
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Liwei Lin
>Priority: Minor
>
> In KafkaOffsetReader, when error occurs, we abort the existing consumer and 
> create a new consumer. In our current implementation, the first consumer and 
> the second consumer would be in the same group, which violates our intention 
> of the two consumers not being in the same group.
> The cause is that, in our current implementation, the first consumer is 
> created before `groupId` and `nextId` are initialized in the constructor. 
> Then even if `groupId` and `nextId` are increased during the creation of that 
> first consumer, `groupId` and `nextId` would still be initialized to default 
> values in the constructor.
> We should make sure that `groupId` and `nextId` are initialized before the 
> creation of the any consumer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19564) KafkaOffsetReader's consumers should not be in the same group

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19564:


Assignee: Apache Spark

> KafkaOffsetReader's consumers should not be in the same group
> -
>
> Key: SPARK-19564
> URL: https://issues.apache.org/jira/browse/SPARK-19564
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Liwei Lin
>Assignee: Apache Spark
>Priority: Minor
>
> In KafkaOffsetReader, when error occurs, we abort the existing consumer and 
> create a new consumer. In our current implementation, the first consumer and 
> the second consumer would be in the same group, which violates our intention 
> of the two consumers not being in the same group.
> The cause is that, in our current implementation, the first consumer is 
> created before `groupId` and `nextId` are initialized in the constructor. 
> Then even if `groupId` and `nextId` are increased during the creation of that 
> first consumer, `groupId` and `nextId` would still be initialized to default 
> values in the constructor.
> We should make sure that `groupId` and `nextId` are initialized before the 
> creation of the any consumer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19564) KafkaOffsetReader's consumers should not be in the same group

2017-02-12 Thread Liwei Lin (JIRA)
Liwei Lin created SPARK-19564:
-

 Summary: KafkaOffsetReader's consumers should not be in the same 
group
 Key: SPARK-19564
 URL: https://issues.apache.org/jira/browse/SPARK-19564
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.1.1, 2.2.0
Reporter: Liwei Lin
Priority: Minor


In KafkaOffsetReader, when error occurs, we abort the existing consumer and 
create a new consumer. In our current implementation, the first consumer and 
the second consumer would be in the same group, which violates our intention of 
the two consumers not being in the same group.

The cause is that, in our current implementation, the first consumer is created 
before `groupId` and `nextId` are initialized in the constructor. Then even if 
`groupId` and `nextId` are increased during the creation of that first 
consumer, `groupId` and `nextId` would still be initialized to default values 
in the constructor.

We should make sure that `groupId` and `nextId` are initialized before the 
creation of the any consumer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19559) Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions

2017-02-12 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862721#comment-15862721
 ] 

Liwei Lin edited comment on SPARK-19559 at 2/12/17 10:07 AM:
-

I think I found the root cause; will submit a patch soon.


was (Author: lwlin):
I think I found the root cause of this; will submit a patch soon.

> Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions
> 
>
> Key: SPARK-19559
> URL: https://issues.apache.org/jira/browse/SPARK-19559
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.1.0
>Reporter: Kay Ousterhout
>
> This test has started failing frequently recently; e.g., 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72720/testReport/junit/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
>  and 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72725/testReport/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
> cc [~zsxwing] and [~tcondie] who seemed to have modified the related code 
> most recently



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19559) Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions

2017-02-12 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862721#comment-15862721
 ] 

Liwei Lin commented on SPARK-19559:
---

I think I found the root cause of this; will submit a patch soon.

> Fix flaky KafkaSourceSuite.subscribing topic by pattern with topic deletions
> 
>
> Key: SPARK-19559
> URL: https://issues.apache.org/jira/browse/SPARK-19559
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.1.0
>Reporter: Kay Ousterhout
>
> This test has started failing frequently recently; e.g., 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72720/testReport/junit/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
>  and 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72725/testReport/org.apache.spark.sql.kafka010/KafkaSourceSuite/subscribing_topic_by_pattern_with_topic_deletions/
> cc [~zsxwing] and [~tcondie] who seemed to have modified the related code 
> most recently



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19563) advoid unnecessary sort in FileFormatWriter

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19563:


Assignee: Wenchen Fan  (was: Apache Spark)

> advoid unnecessary sort in FileFormatWriter
> ---
>
> Key: SPARK-19563
> URL: https://issues.apache.org/jira/browse/SPARK-19563
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19563) advoid unnecessary sort in FileFormatWriter

2017-02-12 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19563:


Assignee: Apache Spark  (was: Wenchen Fan)

> advoid unnecessary sort in FileFormatWriter
> ---
>
> Key: SPARK-19563
> URL: https://issues.apache.org/jira/browse/SPARK-19563
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19563) advoid unnecessary sort in FileFormatWriter

2017-02-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862708#comment-15862708
 ] 

Apache Spark commented on SPARK-19563:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16898

> advoid unnecessary sort in FileFormatWriter
> ---
>
> Key: SPARK-19563
> URL: https://issues.apache.org/jira/browse/SPARK-19563
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19563) advoid unnecessary sort in FileFormatWriter

2017-02-12 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-19563:
---

 Summary: advoid unnecessary sort in FileFormatWriter
 Key: SPARK-19563
 URL: https://issues.apache.org/jira/browse/SPARK-19563
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.2.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun updated SPARK-19562:

Description: 
It's basically in the title.
Running the build and tests as instructed by the Readme creates the folder 
`dev/pr-deps` that is not covered by the gitignore leaving us with this:

{code:none}
➜  spark git:(master) ✗ git status  

On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add ..." to include in what will be committed)

dev/pr-deps/
{code}

I think that folder should be added to the gitignore.

  was:
It's basically in the title.
Running the build and tests as instructed by the Readme creates the folder 
`dev/pr-deps` that is not covered by the gitignore leaving us with this:

{code:bash}
➜  spark git:(master) ✗ git status  

On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add ..." to include in what will be committed)

dev/pr-deps/
{code}

I think that folder should be added to the gitignore.


> Gitignore Misses Folder dev/pr-deps
> ---
>
> Key: SPARK-19562
> URL: https://issues.apache.org/jira/browse/SPARK-19562
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Trivial
>
> It's basically in the title.
> Running the build and tests as instructed by the Readme creates the folder 
> `dev/pr-deps` that is not covered by the gitignore leaving us with this:
> {code:none}
> ➜  spark git:(master) ✗ git status
>   
> On branch master
> Your branch is up-to-date with 'origin/master'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   dev/pr-deps/
> {code}
> I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Armin Braun (JIRA)
Armin Braun created SPARK-19562:
---

 Summary: Gitignore Misses Folder dev/pr-deps
 Key: SPARK-19562
 URL: https://issues.apache.org/jira/browse/SPARK-19562
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.1.0
 Environment: Applies to all Environments
Reporter: Armin Braun
Priority: Trivial


It's basically in the title.
Running the build and tests as instructed by the Readme creates the folder 
`dev/pr-deps` that is not covered by the gitignore leaving us with this:

{code:bash}
➜  spark git:(master) ✗ git status  

On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add ..." to include in what will be committed)

dev/pr-deps/
{code}

I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org