[jira] [Commented] (SPARK-2541) Standalone mode can't access secure HDFS anymore
[ https://issues.apache.org/jira/browse/SPARK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151406#comment-15151406 ] Henry Saputra commented on SPARK-2541: -- Based on discussion on https://github.com/apache/spark/pull/2320 Seemed like we should not close this as dup of https://issues.apache.org/jira/browse/SPARK-3438 This should cover case where a standalone cluster is used to access secure HDFS for single user scenario. > Standalone mode can't access secure HDFS anymore > > > Key: SPARK-2541 > URL: https://issues.apache.org/jira/browse/SPARK-2541 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0, 1.0.1 >Reporter: Thomas Graves > Attachments: SPARK-2541-partial.patch > > > In spark 0.9.x you could access secure HDFS from Standalone deploy, that > doesn't work in 1.X anymore. > It looks like the issues is in SparkHadoopUtil.runAsSparkUser. Previously it > wouldn't do the doAs if the currentUser == user. Not sure how it affects > when the daemons run as a super user but SPARK_USER is set to someone else. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149309#comment-15149309 ] Henry Saputra edited comment on SPARK-5158 at 2/16/16 9:14 PM: --- HI All, seemed like all PRs for this issue are closed. This PR: https://github.com/apache/spark/pull/265 is closed claiming there is a more recent PR is being work on, which I assume is this one: https://github.com/apache/spark/pull/4106 but this one is also closed due to inactivity. Looking at the issues filed that are closed as duplicate for this one, there is a need and interest to get standalone mode to access secured HDFS given the active users keytab already available to the machines that run Spark. was (Author: hsaputra): All, the PR for this issues are closed. This PR: https://github.com/apache/spark/pull/265 is closed claiming there is a more recent PR is being work on, which I assume is this one: https://github.com/apache/spark/pull/4106 but this one is also closed due to inactivity. Looking at the issues filed that are closed as duplicate for this one, there is a need and interest to get standalone mode to access secured HDFS given the active users keytab already available to the machines that run Spark. > Allow for keytab-based HDFS security in Standalone mode > --- > > Key: SPARK-5158 > URL: https://issues.apache.org/jira/browse/SPARK-5158 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Matthew Cheah >Priority: Critical > > There have been a handful of patches for allowing access to Kerberized HDFS > clusters in standalone mode. The main reason we haven't accepted these > patches have been that they rely on insecure distribution of token files from > the driver to the other components. > As a simpler solution, I wonder if we should just provide a way to have the > Spark driver and executors independently log in and acquire credentials using > a keytab. This would work for users who have a dedicated, single-tenant, > Spark clusters (i.e. they are willing to have a keytab on every machine > running Spark for their application). It wouldn't address all possible > deployment scenarios, but if it's simple I think it's worth considering. > This would also work for Spark streaming jobs, which often run on dedicated > hardware since they are long-running services. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5158) Allow for keytab-based HDFS security in Standalone mode
[ https://issues.apache.org/jira/browse/SPARK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149309#comment-15149309 ] Henry Saputra commented on SPARK-5158: -- All, the PR for this issues are closed. This PR: https://github.com/apache/spark/pull/265 is closed claiming there is a more recent PR is being work on, which I assume is this one: https://github.com/apache/spark/pull/4106 but this one is also closed due to inactivity. Looking at the issues filed that are closed as duplicate for this one, there is a need and interest to get standalone mode to access secured HDFS given the active users keytab already available to the machines that run Spark. > Allow for keytab-based HDFS security in Standalone mode > --- > > Key: SPARK-5158 > URL: https://issues.apache.org/jira/browse/SPARK-5158 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Matthew Cheah >Priority: Critical > > There have been a handful of patches for allowing access to Kerberized HDFS > clusters in standalone mode. The main reason we haven't accepted these > patches have been that they rely on insecure distribution of token files from > the driver to the other components. > As a simpler solution, I wonder if we should just provide a way to have the > Spark driver and executors independently log in and acquire credentials using > a keytab. This would work for users who have a dedicated, single-tenant, > Spark clusters (i.e. they are willing to have a keytab on every machine > running Spark for their application). It wouldn't address all possible > deployment scenarios, but if it's simple I think it's worth considering. > This would also work for Spark streaming jobs, which often run on dedicated > hardware since they are long-running services. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6479) Create off-heap block storage API (internal)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384525#comment-14384525 ] Henry Saputra commented on SPARK-6479: -- @Steve: Ah cool, thanks for clarifying =) Create off-heap block storage API (internal) Key: SPARK-6479 URL: https://issues.apache.org/jira/browse/SPARK-6479 Project: Spark Issue Type: Improvement Components: Block Manager, Spark Core Reporter: Reynold Xin Attachments: SparkOffheapsupportbyHDFS.pdf Would be great to create APIs for off-heap block stores, rather than doing a bunch of if statements everywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6479) Create off-heap block storage API (internal)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384304#comment-14384304 ] Henry Saputra commented on SPARK-6479: -- [~ste...@apache.org], could you clarify more how this work related to the YARN registry? I believe the off-heap storage API will be used internally by Spark to get and set the data blocks. Create off-heap block storage API (internal) Key: SPARK-6479 URL: https://issues.apache.org/jira/browse/SPARK-6479 Project: Spark Issue Type: Improvement Components: Block Manager, Spark Core Reporter: Reynold Xin Attachments: SparkOffheapsupportbyHDFS.pdf Would be great to create APIs for off-heap block stores, rather than doing a bunch of if statements everywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6479) Create off-heap block storage API (internal)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378348#comment-14378348 ] Henry Saputra commented on SPARK-6479: -- Ah thanks [~sandyr], makes sense Create off-heap block storage API (internal) Key: SPARK-6479 URL: https://issues.apache.org/jira/browse/SPARK-6479 Project: Spark Issue Type: Improvement Components: Block Manager, Spark Core Reporter: Reynold Xin Attachments: SparkOffheapsupportbyHDFS.pdf Would be great to create APIs for off-heap block stores, rather than doing a bunch of if statements everywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6479) Create off-heap block storage API (internal)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377386#comment-14377386 ] Henry Saputra edited comment on SPARK-6479 at 3/24/15 7:02 AM: --- What do you mean by migrating Tachyon to new APIs? Are you talking about the block store in Spark? was (Author: hsaputra): What do you mean by migrating Tachyon to new APIs? Are you talking about the data store in Spark? Create off-heap block storage API (internal) Key: SPARK-6479 URL: https://issues.apache.org/jira/browse/SPARK-6479 Project: Spark Issue Type: Improvement Components: Block Manager, Spark Core Reporter: Reynold Xin Attachments: SparkOffheapsupportbyHDFS.pdf Would be great to create APIs for off-heap block stores, rather than doing a bunch of if statements everywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-704) ConnectionManager sometimes cannot detect loss of sending connections
[ https://issues.apache.org/jira/browse/SPARK-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284293#comment-14284293 ] Henry Saputra commented on SPARK-704: - Could someone re-assign the issue from me? With the new setting only committer can reassign JIRA. ConnectionManager sometimes cannot detect loss of sending connections - Key: SPARK-704 URL: https://issues.apache.org/jira/browse/SPARK-704 Project: Spark Issue Type: Bug Reporter: Charles Reiss Assignee: Henry Saputra ConnectionManager currently does not detect when SendingConnections disconnect except if it is trying to send through them. As a result, a node failure just after a connection is initiated but before any acknowledgement messages can be sent may result in a hang. ConnectionManager has code intended to detect this case by detecting the failure of a corresponding ReceivingConnection, but this code assumes that the remote host:port of the ReceivingConnection is the same as the ConnectionManagerId, which is almost never true. Additionally, there does not appear to be any reason to assume a corresponding ReceivingConnection will exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1070) Add check for JIRA ticket in the Github pull request title/summary with CI
[ https://issues.apache.org/jira/browse/SPARK-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194961#comment-14194961 ] Henry Saputra commented on SPARK-1070: -- [~nchammas], way back when Patrick propose the right way to send PR there was a discussion to force PR to have JIRA rocket prefix in the summary. This ticket is filed to address that issue/ idea. Add check for JIRA ticket in the Github pull request title/summary with CI -- Key: SPARK-1070 URL: https://issues.apache.org/jira/browse/SPARK-1070 Project: Spark Issue Type: Task Components: Build Reporter: Henry Saputra Assignee: Mark Hamstra Priority: Minor As part of discussion in the dev@ list to add audit trail of Spark's Github pull requests (PR) to JIRA, need to add check maybe in the Jenkins CI to verify that the PRs contain JIRA ticket number in the title/ summary. There are maybe some PRs that may not need ticket so probably add support for some magic keyword to bypass the check. But this should be done in rare cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2731) Update Tachyon dependency to 0.5.0
[ https://issues.apache.org/jira/browse/SPARK-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116005#comment-14116005 ] Henry Saputra commented on SPARK-2731: -- That was quick =P Update Tachyon dependency to 0.5.0 -- Key: SPARK-2731 URL: https://issues.apache.org/jira/browse/SPARK-2731 Project: Spark Issue Type: Task Components: Spark Core Reporter: Henry Saputra Tachyon 0.5.0 [1] has been released and would like to update Spark to that version. The new release has good improvements and important bug fixes [2] [1] http://tachyon-project.org/v0.5.0/ [2] https://github.com/amplab/tachyon/releases/tag/v0.5.0 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-2731) Update Tachyon dependency to 0.5.0
[ https://issues.apache.org/jira/browse/SPARK-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Saputra closed SPARK-2731. Resolution: Duplicate Update Tachyon dependency to 0.5.0 -- Key: SPARK-2731 URL: https://issues.apache.org/jira/browse/SPARK-2731 Project: Spark Issue Type: Task Components: Spark Core Reporter: Henry Saputra Tachyon 0.5.0 [1] has been released and would like to update Spark to that version. The new release has good improvements and important bug fixes [2] [1] http://tachyon-project.org/v0.5.0/ [2] https://github.com/amplab/tachyon/releases/tag/v0.5.0 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2731) Update Tachyon dependency to 0.5.0
Henry Saputra created SPARK-2731: Summary: Update Tachyon dependency to 0.5.0 Key: SPARK-2731 URL: https://issues.apache.org/jira/browse/SPARK-2731 Project: Spark Issue Type: Task Components: Spark Core Reporter: Henry Saputra Tachyon 0.5.0 [1] has been released and would like to update Spark to that version. The new release has good improvements and important bug fixes [2] [1] http://tachyon-project.org/v0.5.0/ [2] https://github.com/amplab/tachyon/releases/tag/v0.5.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2732) Update build script to Tachyon 0.5.0
Henry Saputra created SPARK-2732: Summary: Update build script to Tachyon 0.5.0 Key: SPARK-2732 URL: https://issues.apache.org/jira/browse/SPARK-2732 Project: Spark Issue Type: Sub-task Reporter: Henry Saputra Update Maven pom.xml and sbt script to use Tachyon 0.5.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2733) Update make-distribution.sh to download Tachyon 0.5.0
Henry Saputra created SPARK-2733: Summary: Update make-distribution.sh to download Tachyon 0.5.0 Key: SPARK-2733 URL: https://issues.apache.org/jira/browse/SPARK-2733 Project: Spark Issue Type: Sub-task Reporter: Henry Saputra Need to update make-distribution.sh to download Tachyon 0.5.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2586) Lack of information to figure out connection to Tachyon master is inactive/ down
Henry Saputra created SPARK-2586: Summary: Lack of information to figure out connection to Tachyon master is inactive/ down Key: SPARK-2586 URL: https://issues.apache.org/jira/browse/SPARK-2586 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Henry Saputra When you running Spark with Tachyon, when the connection to Tachyon master is down (due to problem in network or the Master node is down) there is no clear log or error message to indicate it. Here is sample stack running SparkTachyonPi example with Tachyon connecting: 14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5) 14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra 14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hsaputra) 14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started 14/07/15 16:43:11 INFO Remoting: Starting remoting 14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker 14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster 14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-local-20140715164311-e63c 14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id = ConnectionManagerId(office-5-148.pa.gopivotal.com,53204) 14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB 14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager 14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM 14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at http://10.64.5.148:53205 14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:12 INFO SparkUI: Started SparkUI at http://office-5-148.pa.gopivotal.com:4040 2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from SCDynamicStore 14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/15 16:43:12 INFO SparkContext: Added JAR examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar with timestamp 1405467792813 14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master spark://henry-pivotal.local:7077... 14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at SparkTachyonPi.scala:43 14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false) 14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkTachyonPi.scala:43) 14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List() 14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List() 14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39), which has no missing parents 14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39) 14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140715164313- 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: app-20140715164313-/0 on worker-20140715164009-office-5-148.pa.gopivotal.com-52519 (office-5-148.pa.gopivotal.com:52519) with 8 cores 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140715164313-/0 on hostPort office-5-148.pa.gopivotal.com:52519 with 8 cores, 512.0 MB RAM 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor updated: app-20140715164313-/0 is now RUNNING 14/07/15 16:43:15 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:53213/user/Executor#-423405256] with ID 0 14/07/15 16:43:15 INFO TaskSetManager:
[jira] [Updated] (SPARK-2586) Lack of information to figure out connection to Tachyon master is inactive/ down
[ https://issues.apache.org/jira/browse/SPARK-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Saputra updated SPARK-2586: - Labels: tachyon (was: ) Lack of information to figure out connection to Tachyon master is inactive/ down Key: SPARK-2586 URL: https://issues.apache.org/jira/browse/SPARK-2586 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Henry Saputra Labels: tachyon When you running Spark with Tachyon, when the connection to Tachyon master is down (due to problem in network or the Master node is down) there is no clear log or error message to indicate it. Here is sample stack running SparkTachyonPi example with Tachyon connecting: 14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5) 14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra 14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hsaputra) 14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started 14/07/15 16:43:11 INFO Remoting: Starting remoting 14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker 14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster 14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-local-20140715164311-e63c 14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id = ConnectionManagerId(office-5-148.pa.gopivotal.com,53204) 14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB 14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager 14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM 14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at http://10.64.5.148:53205 14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:12 INFO SparkUI: Started SparkUI at http://office-5-148.pa.gopivotal.com:4040 2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from SCDynamicStore 14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/15 16:43:12 INFO SparkContext: Added JAR examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar with timestamp 1405467792813 14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master spark://henry-pivotal.local:7077... 14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at SparkTachyonPi.scala:43 14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false) 14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkTachyonPi.scala:43) 14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List() 14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List() 14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39), which has no missing parents 14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39) 14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140715164313- 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: app-20140715164313-/0 on worker-20140715164009-office-5-148.pa.gopivotal.com-52519 (office-5-148.pa.gopivotal.com:52519) with 8 cores 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140715164313-/0 on hostPort office-5-148.pa.gopivotal.com:52519 with 8 cores, 512.0 MB RAM 14/07/15
[jira] [Commented] (SPARK-2586) Lack of information to figure out connection to Tachyon master is inactive/ down
[ https://issues.apache.org/jira/browse/SPARK-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067224#comment-14067224 ] Henry Saputra commented on SPARK-2586: -- Using Standalone, I do not see log about Tachyon not available in Master or Worker nodes log files Lack of information to figure out connection to Tachyon master is inactive/ down Key: SPARK-2586 URL: https://issues.apache.org/jira/browse/SPARK-2586 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Henry Saputra Labels: tachyon When you running Spark with Tachyon, when the connection to Tachyon master is down (due to problem in network or the Master node is down) there is no clear log or error message to indicate it. Here is sample stack running SparkTachyonPi example with Tachyon connecting: 14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5) 14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra 14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hsaputra) 14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started 14/07/15 16:43:11 INFO Remoting: Starting remoting 14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker 14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster 14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-local-20140715164311-e63c 14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id = ConnectionManagerId(office-5-148.pa.gopivotal.com,53204) 14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB 14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager 14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM 14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at http://10.64.5.148:53205 14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:12 INFO SparkUI: Started SparkUI at http://office-5-148.pa.gopivotal.com:4040 2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from SCDynamicStore 14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/15 16:43:12 INFO SparkContext: Added JAR examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar with timestamp 1405467792813 14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master spark://henry-pivotal.local:7077... 14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at SparkTachyonPi.scala:43 14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false) 14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkTachyonPi.scala:43) 14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List() 14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List() 14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39), which has no missing parents 14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39) 14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140715164313- 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: app-20140715164313-/0 on worker-20140715164009-office-5-148.pa.gopivotal.com-52519 (office-5-148.pa.gopivotal.com:52519) with 8 cores 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted
[jira] [Commented] (SPARK-2500) Move the loginfo for registering BlockManager to BlockManagerMasterActor.register method
[ https://issues.apache.org/jira/browse/SPARK-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062904#comment-14062904 ] Henry Saputra commented on SPARK-2500: -- PR at https://github.com/apache/spark/pull/1424 Move the loginfo for registering BlockManager to BlockManagerMasterActor.register method Key: SPARK-2500 URL: https://issues.apache.org/jira/browse/SPARK-2500 Project: Spark Issue Type: Improvement Reporter: Henry Saputra Assignee: Henry Saputra Priority: Minor Move the logInfo call for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor. Previously the loginfo call for registering the registering a BlockManager is happening in the BlockManagerInfo constructor. This kind of confusing because the code could call new BlockManagerInfo without actually registering a BlockManager and could confuse when reading the log files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2500) Move the loginfo for registering BlockManager to BlockManagerMasterActor.register method
[ https://issues.apache.org/jira/browse/SPARK-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062902#comment-14062902 ] Henry Saputra commented on SPARK-2500: -- Working on this one Move the loginfo for registering BlockManager to BlockManagerMasterActor.register method Key: SPARK-2500 URL: https://issues.apache.org/jira/browse/SPARK-2500 Project: Spark Issue Type: Improvement Reporter: Henry Saputra Priority: Minor Move the logInfo call for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor. Previously the loginfo call for registering the registering a BlockManager is happening in the BlockManagerInfo constructor. This kind of confusing because the code could call new BlockManagerInfo without actually registering a BlockManager and could confuse when reading the log files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2500) Move the loginfo for registering BlockManager to BlockManagerMasterActor.register method
Henry Saputra created SPARK-2500: Summary: Move the loginfo for registering BlockManager to BlockManagerMasterActor.register method Key: SPARK-2500 URL: https://issues.apache.org/jira/browse/SPARK-2500 Project: Spark Issue Type: Improvement Reporter: Henry Saputra Priority: Minor Move the logInfo call for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor. Previously the loginfo call for registering the registering a BlockManager is happening in the BlockManagerInfo constructor. This kind of confusing because the code could call new BlockManagerInfo without actually registering a BlockManager and could confuse when reading the log files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (SPARK-1305) Support persisting RDD's directly to Tachyon
[ https://issues.apache.org/jira/browse/SPARK-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Saputra updated SPARK-1305: - Comment: was deleted (was: Sorry to comment on old JIRA but does anyone have PR for this ticket?) Support persisting RDD's directly to Tachyon Key: SPARK-1305 URL: https://issues.apache.org/jira/browse/SPARK-1305 Project: Spark Issue Type: New Feature Components: Block Manager Reporter: Patrick Wendell Assignee: Haoyuan Li Priority: Blocker Fix For: 1.0.0 This is already an ongoing pull request - in a nutshell we want to support Tachyon as a storage level in Spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2192) Examples Data Not in Binary Distribution
[ https://issues.apache.org/jira/browse/SPARK-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042938#comment-14042938 ] Henry Saputra edited comment on SPARK-2192 at 6/25/14 5:51 PM: --- I think several examples already have the data in the main/resources. Do you have list of which ones missing? was (Author: hsaputra): I think several tests already have the data in the main/resources. Do you have list of which ones missing? Examples Data Not in Binary Distribution Key: SPARK-2192 URL: https://issues.apache.org/jira/browse/SPARK-2192 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.0 Reporter: Pat McDonough The data used by examples is not packaged up with the binary distribution. The data subdirectory of spark should make it's way in to the distribution somewhere so the examples can use it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1000) Crash when running SparkPi example with local-cluster
[ https://issues.apache.org/jira/browse/SPARK-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043908#comment-14043908 ] Henry Saputra commented on SPARK-1000: -- I am not able to repro this with master branch. When you see in http://localhost:8080 do you see if worker has ALIVE status? Crash when running SparkPi example with local-cluster - Key: SPARK-1000 URL: https://issues.apache.org/jira/browse/SPARK-1000 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: xiajunluan when I run SparkPi with local-cluster[2,2,512], it will throw following exception at the end of job. WARNING: An exception was thrown by an exception handler. java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.start(AbstractNioWorker.java:184) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:330) at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:35) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:313) at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:35) at org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34) at org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:504) at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:47) at org.jboss.netty.channel.Channels.fireChannelOpen(Channels.java:170) at org.jboss.netty.channel.socket.nio.NioClientSocketChannel.init(NioClientSocketChannel.java:79) at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.newChannel(NioClientSocketChannelFactory.java:176) at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.newChannel(NioClientSocketChannelFactory.java:82) at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:213) at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:183) at akka.remote.netty.ActiveRemoteClient$$anonfun$connect$1.apply$mcV$sp(Client.scala:173) at akka.util.Switch.liftedTree1$1(LockUtil.scala:33) at akka.util.Switch.transcend(LockUtil.scala:32) at akka.util.Switch.switchOn(LockUtil.scala:55) at akka.remote.netty.ActiveRemoteClient.connect(Client.scala:158) at akka.remote.netty.NettyRemoteTransport.send(NettyRemoteSupport.scala:153) at akka.remote.RemoteActorRef.$bang(RemoteActorRefProvider.scala:247) at akka.actor.LocalDeathWatch$$anonfun$publish$1.apply(ActorRefProvider.scala:559) at akka.actor.LocalDeathWatch$$anonfun$publish$1.apply(ActorRefProvider.scala:559) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.immutable.VectorIterator.foreach(Vector.scala:648) at scala.collection.IterableLike$class.foreach(IterableLike.scala:73) at scala.collection.immutable.Vector.foreach(Vector.scala:63) at akka.actor.LocalDeathWatch.publish(ActorRefProvider.scala:559) at akka.remote.RemoteDeathWatch.publish(RemoteActorRefProvider.scala:280) at akka.remote.RemoteDeathWatch.publish(RemoteActorRefProvider.scala:262) at akka.actor.ActorCell.doTerminate(ActorCell.scala:701) at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:747) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:608) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:209) at akka.dispatch.Mailbox.run(Mailbox.scala:178) at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516) at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2192) Examples Data Not in Binary Distribution
[ https://issues.apache.org/jira/browse/SPARK-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042938#comment-14042938 ] Henry Saputra commented on SPARK-2192: -- I think several tests already have the data in the main/resources. Do you have list of which ones missing? Examples Data Not in Binary Distribution Key: SPARK-2192 URL: https://issues.apache.org/jira/browse/SPARK-2192 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.0 Reporter: Pat McDonough The data used by examples is not packaged up with the binary distribution. The data subdirectory of spark should make it's way in to the distribution somewhere so the examples can use it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-704) ConnectionManager sometimes cannot detect loss of sending connections
[ https://issues.apache.org/jira/browse/SPARK-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041091#comment-14041091 ] Henry Saputra commented on SPARK-704: - Thanks a lot to [~woggle] and [~mridulm80] for clarifying the issue and add additional comments to help make it clear what is happening. Yes, since the NIO's channel for SendingConnection listen to both write and read (from for-loop detection in the ConnectionManager) any loss connection will be detected by the SendingConnection's channel. My concern is about hang issue that Charles mentioned in the issue description, I tried to reproduce by shutting down the node manually but could not really get that situation. Since this is async IO there is no way to know about failure of remote node when there is no activity at the socket, like Mridul, mentioned other than sending keepalive messages. ConnectionManager sometimes cannot detect loss of sending connections - Key: SPARK-704 URL: https://issues.apache.org/jira/browse/SPARK-704 Project: Spark Issue Type: Bug Reporter: Charles Reiss Assignee: Henry Saputra ConnectionManager currently does not detect when SendingConnections disconnect except if it is trying to send through them. As a result, a node failure just after a connection is initiated but before any acknowledgement messages can be sent may result in a hang. ConnectionManager has code intended to detect this case by detecting the failure of a corresponding ReceivingConnection, but this code assumes that the remote host:port of the ReceivingConnection is the same as the ConnectionManagerId, which is almost never true. Additionally, there does not appear to be any reason to assume a corresponding ReceivingConnection will exist. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1305) Support persisting RDD's directly to Tachyon
[ https://issues.apache.org/jira/browse/SPARK-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026037#comment-14026037 ] Henry Saputra commented on SPARK-1305: -- Sorry to comment on old JIRA but does anyone have PR for this ticket? Support persisting RDD's directly to Tachyon Key: SPARK-1305 URL: https://issues.apache.org/jira/browse/SPARK-1305 Project: Spark Issue Type: New Feature Components: Block Manager Reporter: Patrick Wendell Assignee: Haoyuan Li Priority: Blocker Fix For: 1.0.0 This is already an ongoing pull request - in a nutshell we want to support Tachyon as a storage level in Spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1305) Support persisting RDD's directly to Tachyon
[ https://issues.apache.org/jira/browse/SPARK-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026038#comment-14026038 ] Henry Saputra commented on SPARK-1305: -- Never mind, Found it, it was when Spark in incubtor Support persisting RDD's directly to Tachyon Key: SPARK-1305 URL: https://issues.apache.org/jira/browse/SPARK-1305 Project: Spark Issue Type: New Feature Components: Block Manager Reporter: Patrick Wendell Assignee: Haoyuan Li Priority: Blocker Fix For: 1.0.0 This is already an ongoing pull request - in a nutshell we want to support Tachyon as a storage level in Spark. -- This message was sent by Atlassian JIRA (v6.2#6252)