[jira] [Comment Edited] (SPARK-15039) Kinesis reciever does not work in Yarn
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675 ] Tsai Li Ming edited comment on SPARK-15039 at 5/3/16 1:16 PM: -- [~zsxwing] Nothing suspicious in the logs. The streaming tab has 1 receiver but has 0 events/sec [~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. I will test in Standalone mode. was (Author: ltsai): [~zsxwing] Nothing suspcisious in the logs. The streaming tab has 1 receiver but has 0 events/sec [~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. > Kinesis reciever does not work in Yarn > -- > > Key: SPARK-15039 > URL: https://issues.apache.org/jira/browse/SPARK-15039 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 > Environment: YARN > HDP 2.4.0 >Reporter: Tsai Li Ming > > Hi, > Using the pyspark kinesis example, it does not receive any messages from > Kinesis when submitting to a YARN cluster, though it is working fine when > using local mode. > {code} > spark-submit \ > --executor-cores 4 \ > --num-executors 4 \ > --packages > com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 > > {code} > I had to downgrade the package to 1.5.1. 1.6.1 does not work too. > Not sure whether this is related to SPARK-12453 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15039) Kinesis reciever does not work in Yarn
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675 ] Tsai Li Ming commented on SPARK-15039: -- [~zsxwing] Nothing suspcisious in the logs. The streaming tab has 1 receiver but has 0 events/sec [~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. > Kinesis reciever does not work in Yarn > -- > > Key: SPARK-15039 > URL: https://issues.apache.org/jira/browse/SPARK-15039 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 > Environment: YARN > HDP 2.4.0 >Reporter: Tsai Li Ming > > Hi, > Using the pyspark kinesis example, it does not receive any messages from > Kinesis when submitting to a YARN cluster, though it is working fine when > using local mode. > {code} > spark-submit \ > --executor-cores 4 \ > --num-executors 4 \ > --packages > com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 > > {code} > I had to downgrade the package to 1.5.1. 1.6.1 does not work too. > Not sure whether this is related to SPARK-12453 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsai Li Ming updated SPARK-15039: - Description: Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working fine when using local mode. {code} spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 {code} I had to downgrade the package to 1.5.1. 1.6.1 does not work too. Not sure whether this is related to SPARK-12453 was: Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working fine when using local mode. {code} spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 {code} I had to downgrade the package to 1.5.1. 1.6.1 does not work too. > Kinesis reciever does not work in Yarn > -- > > Key: SPARK-15039 > URL: https://issues.apache.org/jira/browse/SPARK-15039 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 > Environment: YARN > HDP 2.4.0 >Reporter: Tsai Li Ming > > Hi, > Using the pyspark kinesis example, it does not receive any messages from > Kinesis when submitting to a YARN cluster, though it is working fine when > using local mode. > {code} > spark-submit \ > --executor-cores 4 \ > --num-executors 4 \ > --packages > com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 > > {code} > I had to downgrade the package to 1.5.1. 1.6.1 does not work too. > Not sure whether this is related to SPARK-12453 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsai Li Ming updated SPARK-15039: - Description: Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working fine when using local mode. {code} spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 {code} I had to downgrade the package to 1.5.1. 1.6.1 does not work too. was: Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working when using local mode. ``` spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 ``` I had to downgrade the package to 1.5.1 before it can work. > Kinesis reciever does not work in Yarn > -- > > Key: SPARK-15039 > URL: https://issues.apache.org/jira/browse/SPARK-15039 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 > Environment: YARN > HDP 2.4.0 >Reporter: Tsai Li Ming > > Hi, > Using the pyspark kinesis example, it does not receive any messages from > Kinesis when submitting to a YARN cluster, though it is working fine when > using local mode. > {code} > spark-submit \ > --executor-cores 4 \ > --num-executors 4 \ > --packages > com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 > > {code} > I had to downgrade the package to 1.5.1. 1.6.1 does not work too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15039) Kinesis reciever does not work in Yarn
Tsai Li Ming created SPARK-15039: Summary: Kinesis reciever does not work in Yarn Key: SPARK-15039 URL: https://issues.apache.org/jira/browse/SPARK-15039 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.6.0 Environment: YARN HDP 2.4.0 Reporter: Tsai Li Ming Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working when using local mode. ``` spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 ``` I had to downgrade the package to 1.5.1 before it can work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel
[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623 ] Tsai Li Ming commented on SPARK-3220: - I built Derrick's kmeans against Spark 1.6.0 and ran {code} import com.massivedatascience.clusterer.KMeans val clusters = KMeans.train(parsedData, numClusters, numIterations) {code} It took 41mins with the same dataset/settings compared to 1hr using Mllib. In both cases, there was enough memory to cache everything. > K-Means clusterer should perform K-Means initialization in parallel > --- > > Key: SPARK-3220 > URL: https://issues.apache.org/jira/browse/SPARK-3220 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Derrick Burns > Labels: clustering > > The LocalKMeans method should be replaced with a parallel implementation. As > it stands now, it becomes a bottleneck for large data sets. > I have implemented this functionality in my version of the clusterer. > However, I see that there are hundreds of outstanding pull requests. If > someone on the team wants to sponsor the pull request, I will create one. > Otherwise, I will just maintain my own private fork of the clusterer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel
[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623 ] Tsai Li Ming edited comment on SPARK-3220 at 2/10/16 11:01 AM: --- I built Derrick's kmeans against Spark 1.6.0 and ran {code} import com.massivedatascience.clusterer.KMeans val clusters = KMeans.train(parsedData, numClusters, numIterations) {code} It took 41mins with the same dataset/settings compared to 1hr using Mllib, though it slowed down during _reduceByKeyLocally_ phase. In both cases, there was enough memory to cache everything. was (Author: ltsai): I built Derrick's kmeans against Spark 1.6.0 and ran {code} import com.massivedatascience.clusterer.KMeans val clusters = KMeans.train(parsedData, numClusters, numIterations) {code} It took 41mins with the same dataset/settings compared to 1hr using Mllib. In both cases, there was enough memory to cache everything. > K-Means clusterer should perform K-Means initialization in parallel > --- > > Key: SPARK-3220 > URL: https://issues.apache.org/jira/browse/SPARK-3220 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Derrick Burns > Labels: clustering > > The LocalKMeans method should be replaced with a parallel implementation. As > it stands now, it becomes a bottleneck for large data sets. > I have implemented this functionality in my version of the clusterer. > However, I see that there are hundreds of outstanding pull requests. If > someone on the team wants to sponsor the pull request, I will create one. > Otherwise, I will just maintain my own private fork of the clusterer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel
[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140434#comment-15140434 ] Tsai Li Ming commented on SPARK-3220: - [~derrickburns], Is your private fork at https://github.com/derrickburns/generalized-kmeans-clustering ? I am having the same problem here: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Kmeans-using-1-core-only-Was-Slowness-in-Kmeans-calculating-fastSquaredDistance-td16304.html > K-Means clusterer should perform K-Means initialization in parallel > --- > > Key: SPARK-3220 > URL: https://issues.apache.org/jira/browse/SPARK-3220 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Derrick Burns > Labels: clustering > > The LocalKMeans method should be replaced with a parallel implementation. As > it stands now, it becomes a bottleneck for large data sets. > I have implemented this functionality in my version of the clusterer. > However, I see that there are hundreds of outstanding pull requests. If > someone on the team wants to sponsor the pull request, I will create one. > Otherwise, I will just maintain my own private fork of the clusterer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
The usage of OpenBLAS
Hi, I found out that the instructions for OpenBLAS has been changed by the author of netlib-java in: https://github.com/apache/spark/pull/4448 since Spark 1.3.0 In that PR, I asked whether there’s still a need to compile OpenBLAS with USE_THREAD=0, and also about Intel MKL. Is it still applicable or no longer the case anymore? Thanks, Liming - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Issues building 1.4.0 using make-distribution
Hi, I downloaded the source from Downloads page and ran the make-distribution.sh script. # ./make-distribution.sh --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package The script has “-x” set in the beginning. ++ /tmp/a/spark-1.4.0/build/mvn help:evaluate -Dexpression=project.version -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package ++ grep -v INFO ++ tail -n 1 + VERSION='[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin' ++ /tmp/a/spark-1.4.0/build/mvn help:evaluate -Dexpression=scala.binary.version -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package ++ grep -v INFO ++ tail -n 1 + SCALA_VERSION='[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin' ++ /tmp/a/spark-1.4.0/build/mvn help:evaluate -Dexpression=hadoop.version -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package ++ grep -v INFO ++ tail -n 1 … + TARDIR_NAME='spark-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin' + TARDIR='/tmp/a/spark-1.4.0/spark-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin' + rm -rf '/tmp/a/spark-1.4.0/spark-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin' + cp -r /tmp/a/spark-1.4.0/dist '/tmp/a/spark-1.4.0/spark-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin' cp: cannot create directory `/tmp/a/spark-1.4.0/spark-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin': No such file or directory The dist directory seems complete and does work. Thanks, Liming - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Documentation for external shuffle service in 1.4.0
Hi, I can’t seem to find any documentation on this feature in 1.4.0? Regards, Liming - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Not getting event logs = spark 1.3.1
Forgot to mention this is on standalone mode. Is my configuration wrong? Thanks, Liming On 15 Jun, 2015, at 11:26 pm, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, I have this in my spark-defaults.conf (same for hdfs): spark.eventLog.enabled true spark.eventLog.dir file:/tmp/spark-events spark.history.fs.logDirectory file:/tmp/spark-events While the app is running, there is a “.inprogress” directory. However when the job completes, the directory is always empty. I’m submitting the job like this, using either the Pi or world count examples: $ bin/spark-submit /opt/spark-1.4.0-bin-hadoop2.6/examples/src/main/python/wordcount.py This used to be working in 1.2.1 and didn’t test 1.3.0. Regards, Liming - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Not getting event logs = spark 1.3.1
Hi, I have this in my spark-defaults.conf (same for hdfs): spark.eventLog.enabled true spark.eventLog.dir file:/tmp/spark-events spark.history.fs.logDirectory file:/tmp/spark-events While the app is running, there is a “.inprogress” directory. However when the job completes, the directory is always empty. I’m submitting the job like this, using either the Pi or world count examples: $ bin/spark-submit /opt/spark-1.4.0-bin-hadoop2.6/examples/src/main/python/wordcount.py This used to be working in 1.2.1 and didn’t test 1.3.0. Regards, Liming - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Logstash as a source?
I have been using a logstash alternative - fluentd to ingest the data into hdfs. I had to configure fluentd to not append the data so that spark streaming will be able to pick up the new logs. -Liming On 2 Feb, 2015, at 6:05 am, NORD SC jan.algermis...@nordsc.com wrote: Hi, I plan to have logstash send log events (as key value pairs) to spark streaming using Spark on Cassandra. Being completely fresh to Spark, I have a couple of questions: - is that a good idea at all, or would it be better to put e.g. Kafka in between to handle traffic peeks (IOW: how and how well would Spark Streaming handle peeks?) - Is there already a logstash-source implementation for Spark Streaming - assuming there is none yet and assuming it is a good idea: I’d dive into writing it myself - what would the core advice be to avoid biginner traps? Jan - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Confused why I'm losing workers/executors when writing a large file to S3
I’m getting the same issue on Spark 1.2.0. Despite having set “spark.core.connection.ack.wait.timeout” in spark-defaults.conf and verified in the job UI (port 4040) environment tab, I still get the “no heartbeat in 60 seconds” error. spark.core.connection.ack.wait.timeout=3600 15/01/22 07:29:36 WARN master.Master: Removing worker-20150121231529-numaq1-4-34948 because we got no heartbeat in 60 seconds On 14 Nov, 2014, at 3:04 pm, Reynold Xin r...@databricks.com wrote: Darin, You might want to increase these config options also: spark.akka.timeout 300 spark.storage.blockManagerSlaveTimeoutMs 30 On Thu, Nov 13, 2014 at 11:31 AM, Darin McBeath ddmcbe...@yahoo.com.invalid wrote: For one of my Spark jobs, my workers/executors are dying and leaving the cluster. On the master, I see something like the following in the log file. I'm surprised to see the '60' seconds in the master log below because I explicitly set it to '600' (or so I thought) in my spark job (see below). This is happening at the end of my job when I'm trying to persist a large RDD (probably around 300+GB) back to S3 (in 256 partitions). My cluster consists of 6 r3.8xlarge machines. The job successfully works when I'm outputting 100GB or 200GB. If you have any thoughts/insights, it would be appreciated. Thanks. Darin. Here is where I'm setting the 'timeout' in my spark job. SparkConf conf = new SparkConf() .setAppName(SparkSync Application) .set(spark.serializer, org.apache.spark.serializer.KryoSerializer) .set(spark.rdd.compress,true) .set(spark.core.connection.ack.wait.timeout,600); On the master, I see the following in the log file. 4/11/13 17:20:39 WARN master.Master: Removing worker-20141113134801-ip-10-35-184-232.ec2.internal-51877 because we got no heartbeat in 60 seconds 14/11/13 17:20:39 INFO master.Master: Removing worker worker-20141113134801-ip-10-35-184-232.ec2.internal-51877 on ip-10-35-184-232.ec2.internal:51877 14/11/13 17:20:39 INFO master.Master: Telling app of lost executor: 2 On a worker, I see something like the following in the log file. 14/11/13 17:20:58 WARN util.AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:362) 14/11/13 17:21:11 INFO httpclient.HttpMethodDirector: I/O exception (java.net.SocketException) caught when processing request: Broken pipe 14/11/13 17:21:11 INFO httpclient.HttpMethodDirector: Retrying request 14/11/13 17:21:32 INFO httpclient.HttpMethodDirector: I/O exception (java.net.SocketException) caught when processing request: Broken pipe 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark 14/11/13 17:21:34 INFO
Understanding stages in WebUI
Hi, I have the classic word count example: file.flatMap(line = line.split( )).map(word = (word,1)).reduceByKey(_ + _).collect() From the Job UI, I can only see 2 stages: 0-collect and 1-map. What happened to ShuffledRDD in reduceByKey? And both flatMap and map operations is collapsed into a single stage? 14/11/25 16:02:35 INFO SparkContext: Starting job: collect at console:15 14/11/25 16:02:35 INFO DAGScheduler: Registering RDD 6 (map at console:15) 14/11/25 16:02:35 INFO DAGScheduler: Got job 0 (collect at console:15) with 2 output partitions (allowLocal=false) 14/11/25 16:02:35 INFO DAGScheduler: Final stage: Stage 0(collect at console:15) 14/11/25 16:02:35 INFO DAGScheduler: Parents of final stage: List(Stage 1) 14/11/25 16:02:35 INFO DAGScheduler: Missing parents: List(Stage 1) 14/11/25 16:02:35 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[6] at map at console:15), which has no missing parents 14/11/25 16:02:35 INFO MemoryStore: ensureFreeSpace(3464) called with curMem=163705, maxMem=278302556 14/11/25 16:02:35 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.4 KB, free 265.3 MB) 14/11/25 16:02:35 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[6] at map at console:15) 14/11/25 16:02:35 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 14/11/25 16:02:35 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, PROCESS_LOCAL, 1208 bytes) 14/11/25 16:02:35 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1208 bytes) 14/11/25 16:02:35 INFO Executor: Running task 0.0 in stage 1.0 (TID 0) 14/11/25 16:02:35 INFO Executor: Running task 1.0 in stage 1.0 (TID 1) 14/11/25 16:02:35 INFO HadoopRDD: Input split: file:/Users/ltsai/Downloads/spark-1.1.0-bin-hadoop2.4/README.md:0+2405 14/11/25 16:02:35 INFO HadoopRDD: Input split: file:/Users/ltsai/Downloads/spark-1.1.0-bin-hadoop2.4/README.md:2405+2406 14/11/25 16:02:35 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 14/11/25 16:02:35 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 14/11/25 16:02:35 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 14/11/25 16:02:35 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 14/11/25 16:02:35 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 14/11/25 16:02:36 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 1869 bytes result sent to driver 14/11/25 16:02:36 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 1869 bytes result sent to driver 14/11/25 16:02:36 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 536 ms on localhost (1/2) 14/11/25 16:02:36 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 529 ms on localhost (2/2) 14/11/25 16:02:36 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 14/11/25 16:02:36 INFO DAGScheduler: Stage 1 (map at console:15) finished in 0.562 s 14/11/25 16:02:36 INFO DAGScheduler: looking for newly runnable stages 14/11/25 16:02:36 INFO DAGScheduler: running: Set() 14/11/25 16:02:36 INFO DAGScheduler: waiting: Set(Stage 0) 14/11/25 16:02:36 INFO DAGScheduler: failed: Set() 14/11/25 16:02:36 INFO DAGScheduler: Missing parents for Stage 0: List() 14/11/25 16:02:36 INFO DAGScheduler: Submitting Stage 0 (ShuffledRDD[7] at reduceByKey at console:15), which is now runnable 14/11/25 16:02:36 INFO MemoryStore: ensureFreeSpace(2112) called with curMem=167169, maxMem=278302556 14/11/25 16:02:36 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 265.2 MB) 14/11/25 16:02:36 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (ShuffledRDD[7] at reduceByKey at console:15) 14/11/25 16:02:36 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/11/25 16:02:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 948 bytes) 14/11/25 16:02:36 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 948 bytes) 14/11/25 16:02:36 INFO Executor: Running task 0.0 in stage 0.0 (TID 2) 14/11/25 16:02:36 INFO Executor: Running task 1.0 in stage 0.0 (TID 3) 14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 5 ms 14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0
RDD memory and storage level option
Hi, This is on version 1.1.0. I’m did a simple test on MEMORY_AND_DISK storage level. var file = sc.textFile(“file:///path/to/file.txt”).persit(StorageLevel.MEMORY_AND_DISK) file.count() The file is 1.5GB and there is only 1 worker. I have requested for 1GB of worker memory per node: ID Name Cores Memory per Node Submitted Time User State Duration app-20141120193912-0002 Spark shell 641024.0 MB 2014/11/20 19:39:12 root RUNNING 6.0 min After doing a simple count, the job web ui indicates the entire file is saved on disk? RDD NameStorage Level Cached Fraction Size in Size in Size on Partitions Cached Memory Tachyon Disk file:///path/to/file.txt Disk Serialized 1x 46 100% 0.0 B 0.0 B1476.5 MB Replicated 1. Shouldn’t some partitions be saved into memory? 2. If I run with MEMORY_ONLY option, I can save some partitions into memory but there are still space left according to the executor page 220.6 MB / 530.3MB and it did not fully use up them? Each partition is about 73MB. RDD Name Storage Level Cached Fraction Size in Size inSize on Partitions Cached Memory Tachyon Disk file:///path/to/file.txt Memory Deserialized 3 7%220.6 MB0.0 B0.0 B 1x Replicated ExecutorAddress RDD MemoryDisk Active Failed CompleteTotal Task Input Shuffle Shuffle ID BlocksUsed Used TasksTasks Tasks Tasks TimeReadWrite 220.6 MB 1457.4MB 0 foo.co:48660 3/ 530.3 0.0 B 0046 46 14.2 m 0.0 B0.0 B MB 14/11/20 19:53:22 INFO BlockManagerInfo: Added rdd_1_22 in memory on foo.co:48660 (size: 73.6 MB, free: 309.6 MB) 14/11/20 19:53:22 INFO TaskSetManager: Finished task 22.0 in stage 0.0 (TID 22) in 29833 ms on foo.co (43/46) 14/11/20 19:53:24 INFO TaskSetManager: Finished task 33.0 in stage 0.0 (TID 33) in 31502 ms on foo.co (44/46) 14/11/20 19:53:24 INFO TaskSetManager: Finished task 24.0 in stage 0.0 (TID 24) in 31651 ms on foo.co (45/46) 14/11/20 19:53:24 INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) in 31782 ms on foo.co (46/46) 14/11/20 19:53:24 INFO DAGScheduler: Stage 0 (count at console:16) finished in 31.818 s 14/11/20 19:53:24 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/11/20 19:53:24 INFO SparkContext: Job finished: count at console:16, took 31.926585742 s res0: Long = 1000 Is this correct? 3. I can’t seem to work out the math to derive 530MB that is made available in the executor? 1024MB * memoryFraction(0.6) = 614.4 Thanks! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?
Another observation I had was reading over local filesystem with “file://“. it was stated as PROCESS_LOCAL which was confusing. Regards, Liming On 13 Sep, 2014, at 3:12 am, Nicholas Chammas nicholas.cham...@gmail.com wrote: Andrew, This email was pretty helpful. I feel like this stuff should be summarized in the docs somewhere, or perhaps in a blog post. Do you know if it is? Nick On Thu, Jun 5, 2014 at 6:36 PM, Andrew Ash and...@andrewash.com wrote: The locality is how close the data is to the code that's processing it. PROCESS_LOCAL means data is in the same JVM as the code that's running, so it's really fast. NODE_LOCAL might mean that the data is in HDFS on the same node, or in another executor on the same node, so is a little slower because the data has to travel across an IPC connection. RACK_LOCAL is even slower -- data is on a different server so needs to be sent over the network. Spark switches to lower locality levels when there's no unprocessed data on a node that has idle CPUs. In that situation you have two options: wait until the busy CPUs free up so you can start another task that uses data on that server, or start a new task on a farther away server that needs to bring data from that remote place. What Spark typically does is wait a bit in the hopes that a busy CPU frees up. Once that timeout expires, it starts moving the data from far away to the free CPU. The main tunable option is how far long the scheduler waits before starting to move data rather than code. Those are the spark.locality.* settings here: http://spark.apache.org/docs/latest/configuration.html If you want to prevent this from happening entirely, you can set the values to ridiculously high numbers. The documentation also mentions that 0 has special meaning, so you can try that as well. Good luck! Andrew On Thu, Jun 5, 2014 at 3:13 PM, Sung Hwan Chung coded...@cs.stanford.edu wrote: I noticed that sometimes tasks would switch from PROCESS_LOCAL (I'd assume that this means fully cached) to NODE_LOCAL or even RACK_LOCAL. When these happen things get extremely slow. Does this mean that the executor got terminated and restarted? Is there a way to prevent this from happening (barring the machine actually going down, I'd rather stick with the same process)?
[slurm-dev] Cyclic distribution problem
Hi, I’m running 2 slurmds on a single host (built with --enable-multiple-slurmd). The total cpus are divided equally among the 2 nodes. I’m trying to test the distribution modes=block/cyclic but the tasks are always allocated on the first node unless I use --ntasks-per-node=1 $ srun -n2 --distribution=block/cyclic sleep 100 I’m using: SelectType=select/cons_res SelectTypeParameters=CR_CPU_MEMORY NodeName=ltsai-dev-rhel7-1 NodeHostname=ltsai-dev-rhel7 Port=17001 Sockets=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1841 State=UNKNOWN NodeName=ltsai-dev-rhel7-2 NodeHostname=ltsai-dev-rhel7 Port=17002 Sockets=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1841 State=UNKNOWN PartitionName=compute Nodes=ltsai-dev-rhel7-[1-2] Default=YES MaxTime=INFINITE State=UP Did I misconfigure something? Thanks!=
[slurm-dev] Reserved Partition name?
Hi, I am using the following partition name DEFAULT/default but slurmctld is not able to start. NodeName=compute State=UNKNOWN PartitionName=default Nodes=compute Default=YES MaxTime=INFINITE State=UP slurmctld: debug: Reading slurm.conf file: /opt/slurm-14.03.0/etc/slurm.conf slurmctld: topology NONE plugin loaded slurmctld: debug: No DownNodes slurmctld: fatal: No PartitionName information available! I can use other names such as “debug” or “compute”. This is Slurm version 14.03. Thanks!
[slurm-dev] Mismatch in cpu configuration
Hi, I’m testing slurm on my vm. My compute node is defined in slurmd.conf without any CPU/Socket/Core/Thread information: NodeName=compute State=UNKNOWN # ./slurmd -C ClusterName=(null) NodeName=compute CPUs=2 Boards=1 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1463 TmpDisk=14649 UpTime=0-07:10:57 However, when I start slurmd: # ./slurmd -D -vvv slurmd: debug2: hwloc_topology_init slurmd: debug2: hwloc_topology_load slurmd: debug: CPUs:2 Boards:1 Sockets:1 CoresPerSocket:2 ThreadsPerCore:1 == slurmd: Node configuration differs from hardware: CPUs=1:2(hw) Boards=1:1(hw) SocketsPerBoard=1:1(hw) CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw) == slurmd: topology NONE plugin loaded slurmd: CPU frequency setting not configured for this node slurmd: task NONE plugin loaded slurmd: auth plugin for Munge (http://code.google.com/p/munge/) loaded slurmd: debug: spank: opening plugin stack /opt/slurm-14.03.0/etc/plugstack.conf slurmd: Munge cryptographic signature plugin loaded slurmd: Warning: Core limit is only 0 KB slurmd: slurmd version 14.03.0 started slurmd: Job accounting gather NOT_INVOKED plugin loaded slurmd: debug: job_container none plugin loaded slurmd: switch NONE plugin loaded slurmd: slurmd started on Sun, 06 Apr 2014 09:26:11 +0800 slurmd: CPUs=1 Boards=1 Sockets=1 Cores=1 Threads=1 Memory=1463 TmpDisk=14649 Uptime=25884 == slurmd: AcctGatherEnergy NONE plugin loaded slurmd: AcctGatherProfile NONE plugin loaded slurmd: AcctGatherInfiniband NONE plugin loaded slurmd: AcctGatherFilesystem NONE plugin loaded slurmd: debug2: No acct_gather.conf file (/opt/slurm-14.03.0/etc/acct_gather.conf) Do I have to manually configure them for each compute node? Thanks!=
Re: Hadoop LR comparison
Thanks. What will be equivalent code in Hadoop where Spark published the 110s/0.9s comparison? On 1 Apr, 2014, at 2:44 pm, DB Tsai dbt...@alpinenow.com wrote: Hi Li-Ming, This binary logistic regression using SGD is in https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala We're working on multinomial logistic regression using Newton and L-BFGS optimizer now. Will be released soon. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Mon, Mar 31, 2014 at 11:38 PM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, Is the code available for Hadoop to calculate the Logistic Regression hyperplane? I’m looking at the Examples: http://spark.apache.org/examples.html, where there is the 110s vs 0.9s in Hadoop vs Spark comparison. Thanks!
Re: Configuring shuffle write directory
Hi, Thanks! I found out that I wasn’t setting the SPARK_JAVA_OPTS correctly.. I took a look at the process table and saw that the “org.apache.spark.executor.CoarseGrainedExecutorBackend” didn’t have the -Dspark.local.dir set. On 28 Mar, 2014, at 1:05 pm, Matei Zaharia matei.zaha...@gmail.com wrote: I see, are you sure that was in spark-env.sh instead of spark-env.sh.template? You need to copy it to just a .sh file. Also make sure the file is executable. Try doing println(sc.getConf.toDebugString) in your driver program and seeing what properties it prints. As far as I can tell, spark.local.dir should *not* be set there, so workers should get it from their spark-env.sh. It’s true that if you set spark.local.dir in the driver it would pass that on to the workers for that job. Matei On Mar 27, 2014, at 9:57 PM, Tsai Li Ming mailingl...@ltsai.com wrote: Yes, I have tried that by adding it to the Worker. I can see the app-20140328124540-000” in the local spark directory of the worker. But the “spark-local” directories are always written to /tmp since is the default spark.local.dir is taken from java.io.tempdir? On 28 Mar, 2014, at 12:42 pm, Matei Zaharia matei.zaha...@gmail.com wrote: Yes, the problem is that the driver program is overriding it. Have you set it manually in the driver? Or how did you try setting it in workers? You should set it by adding export SPARK_JAVA_OPTS=“-Dspark.local.dir=whatever” to conf/spark-env.sh on those workers. Matei On Mar 27, 2014, at 9:04 PM, Tsai Li Ming mailingl...@ltsai.com wrote: Anyone can help? How can I configure a different spark.local.dir for each executor? On 23 Mar, 2014, at 12:11 am, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, Each of my worker node has its own unique spark.local.dir. However, when I run spark-shell, the shuffle writes are always written to /tmp despite being set when the worker node is started. By specifying the spark.local.dir for the driver program, it seems to override the executor? Is there a way to properly define it in the worker node? Thanks!
Re: Configuring shuffle write directory
Anyone can help? How can I configure a different spark.local.dir for each executor? On 23 Mar, 2014, at 12:11 am, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, Each of my worker node has its own unique spark.local.dir. However, when I run spark-shell, the shuffle writes are always written to /tmp despite being set when the worker node is started. By specifying the spark.local.dir for the driver program, it seems to override the executor? Is there a way to properly define it in the worker node? Thanks!
Setting SPARK_MEM higher than available memory in driver
Hi, My worker nodes have more memory than the host that I’m submitting my driver program, but it seems that SPARK_MEM is also setting the Xmx of the spark shell? $ SPARK_MEM=100g MASTER=spark://XXX:7077 bin/spark-shell Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x7f736e13, 205634994176, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 205634994176 bytes for committing reserved memory. I want to allocate at least 100GB of memory per executor. The allocated memory on the executor seems to depend on the -Xmx heap size of the driver? Thanks!
Re: Kmeans example reduceByKey slow
Hi, This is on a 4 nodes cluster each with 32 cores/256GB Ram. (0.9.0) is deployed in a stand alone mode. Each worker is configured with 192GB. Spark executor memory is also 192GB. This is on the first iteration. K=50. Here’s the code I use: http://pastebin.com/2yXL3y8i , which is a copy-and-paste of the example. Thanks! On 24 Mar, 2014, at 2:46 pm, Xiangrui Meng men...@gmail.com wrote: Hi Tsai, Could you share more information about the machine you used and the training parameters (runs, k, and iterations)? It can help solve your issues. Thanks! Best, Xiangrui On Sun, Mar 23, 2014 at 3:15 AM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, At the reduceBuyKey stage, it takes a few minutes before the tasks start working. I have -Dspark.default.parallelism=127 cores (n-1). CPU/Network/IO is idling across all nodes when this is happening. And there is nothing particular on the master log file. From the spark-shell: 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:124 as TID 538 on executor 2: XXX (PROCESS_LOCAL) 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:124 as 38765155 bytes in 193 ms 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:125 as TID 539 on executor 1: XXX (PROCESS_LOCAL) 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:125 as 38765155 bytes in 96 ms 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:126 as TID 540 on executor 0: XXX (PROCESS_LOCAL) 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:126 as 38765155 bytes in 100 ms But it stops there for some significant time before any movement. In the stage detail of the UI, I can see that there are 127 tasks running but the duration each is at least a few minutes. I'm working off local storage (not hdfs) and the kmeans data is about 6.5GB (50M rows). Is this a normal behaviour? Thanks!
Re: Kmeans example reduceByKey slow
Thanks, Let me try with a smaller K. Does the size of the input data matters for the example? Currently I have 50M rows. What is a reasonable size to demonstrate the capability of Spark? On 24 Mar, 2014, at 3:38 pm, Xiangrui Meng men...@gmail.com wrote: K = 50 is certainly a large number for k-means. If there is no particular reason to have 50 clusters, could you try to reduce it to, e.g, 100 or 1000? Also, the example code is not for large-scale problems. You should use the KMeans algorithm in mllib clustering for your problem. -Xiangrui On Sun, Mar 23, 2014 at 11:53 PM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, This is on a 4 nodes cluster each with 32 cores/256GB Ram. (0.9.0) is deployed in a stand alone mode. Each worker is configured with 192GB. Spark executor memory is also 192GB. This is on the first iteration. K=50. Here's the code I use: http://pastebin.com/2yXL3y8i , which is a copy-and-paste of the example. Thanks! On 24 Mar, 2014, at 2:46 pm, Xiangrui Meng men...@gmail.com wrote: Hi Tsai, Could you share more information about the machine you used and the training parameters (runs, k, and iterations)? It can help solve your issues. Thanks! Best, Xiangrui On Sun, Mar 23, 2014 at 3:15 AM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, At the reduceBuyKey stage, it takes a few minutes before the tasks start working. I have -Dspark.default.parallelism=127 cores (n-1). CPU/Network/IO is idling across all nodes when this is happening. And there is nothing particular on the master log file. From the spark-shell: 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:124 as TID 538 on executor 2: XXX (PROCESS_LOCAL) 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:124 as 38765155 bytes in 193 ms 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:125 as TID 539 on executor 1: XXX (PROCESS_LOCAL) 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:125 as 38765155 bytes in 96 ms 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:126 as TID 540 on executor 0: XXX (PROCESS_LOCAL) 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:126 as 38765155 bytes in 100 ms But it stops there for some significant time before any movement. In the stage detail of the UI, I can see that there are 127 tasks running but the duration each is at least a few minutes. I'm working off local storage (not hdfs) and the kmeans data is about 6.5GB (50M rows). Is this a normal behaviour? Thanks!
[R] Running Rmpi/OpenMPI issues
Hi, I have R 3.0.3 and OpenMPI 1.6.5. Here’s my test script: library(snow) nbNodes - 4 cl - makeCluster(nbNodes, MPI) clusterCall(cl, function() Sys.info()[c(nodename,machine)]) mpi.quit() And the mpirun command: /opt/openmpi-1.6.5-intel/bin/mpirun -np 1 -H host1,host2,host3,host4 R --no-save ~/test_mpi.R Here’s the output: cl - makeCluster(nbNodes, MPI) Loading required package: Rmpi 4 slaves are spawned successfully. 0 failed. clusterCall(cl, function() Sys.info()[c(nodename,machine)]) [[1]] nodename machine “host1 x86_64 [[2]] nodename machine “host1 x86_64 [[3]] nodename machine “host1 x86_64 [[4]] nodename machine “host1 x86_64 mpi.quit() I followed the instructions from: http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf , specifically to use -np 1 1. Why is it not running on the rest of the nodes? I can see all 4 processes on host1 and no orted daemon running. What should be the correct way to run this? 2. mpi.quit() just hangs there. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Configuring shuffle write directory
Hi, Each of my worker node has its own unique spark.local.dir. However, when I run spark-shell, the shuffle writes are always written to /tmp despite being set when the worker node is started. By specifying the spark.local.dir for the driver program, it seems to override the executor? Is there a way to properly define it in the worker node? Thanks!
Spark temp dir (spark.local.dir)
Hi, I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir). What's the difference? I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running. I have also tried setting -Dspark.local.dir when I run the application. Thanks!
Re: Spark temp dir (spark.local.dir)
spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory) Do you mean the worker nodes? Don’t think they are jetty connectors and the directories are empty: /tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/jars /tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/files I run the application like this, even with the java.io.tmpdir : bin/run-example -Dspark.executor.memory=14g -Dspark.local.dir=/mnt/storage1/lm -Djava.io.tmpdir=/mnt/storage1/lm org.apache.spark.examples.SparkLR spark://oct1:7077 10 On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel guillaume.pi...@exensa.com wrote: Also, I think the jetty connector will create a small file or directory in /tmp regardless of the spark.local.dir It's very small, about 10KB Guillaume I'm not 100% sure but I think it goes like this : spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory) the SPARK_WORKER_DIR is where the jars and the log output of the executors is placed (default $SPARK_HOME/work/) and it should be cleaned regularly In $SPARK_HOME/logs are found the logs of the workers and master Guillaume Hi, I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir). What's the difference? I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running. I have also tried setting -Dspark.local.dir when I run the application. Thanks! -- Mail Attachment.png Guillaume PITEL, Président +33(0)6 25 48 86 80 eXenSa S.A.S. 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05 -- exensa_logo_mail.png Guillaume PITEL, Président +33(0)6 25 48 86 80 eXenSa S.A.S. 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
data locality in logs
Hi, In older posts on Google Groups, there was mention of checking the logs on “preferred/non-preferred” for data locality. But I can’t seem to find this on 0.9.0 anymore? Has this been changed to “PROCESS_LOCAL” , like this: 14/02/06 13:51:45 INFO TaskSetManager: Starting task 9.0:50 as TID 568 on executor 0: xxx (PROCESS_LOCAL) What is the difference between process-local and node-local? Thanks, Liming
ClassNotFoundException: PRCombiner
Hi, While running the Bagel’s Wikipedia Page Rank example (org.apache.spark.examples.bagel.WikipediaPageRank), it is having this error at the end: org.apache.spark.SparkException: Job aborted: Task 3.0:4 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.examples.bagel.PRCombiner) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/02/04 09:56:34 INFO TaskSetManager: Starting task 4.0:2 as TID 170 on executor 0: oct1 (NODE_LOCAL) 14/02/04 09:56:34 INFO TaskSetManager: Serialized task 4.0:2 as 1754 bytes in 0 ms 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING from TID 147 because its task set is gone 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING from TID 148 because its task set is gone 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state FAILED from TID 147 because its task set is gone 14/02/04 09:56:34 INFO TaskSetManager: Starting task 4.0:3 as TID 171 on executor 2: oct2 (NODE_LOCAL) 14/02/04 09:56:34 INFO TaskSetManager: Serialized task 4.0:3 as 1754 bytes in 0 ms 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING from TID 149 because its task set is gone 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING from TID 150 because its task set is gone 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state FAILED from TID 148 because its task set is gone I’m using 0.9.0. Thanks!
Re: ClassNotFoundException: PRCombiner
On 4 Feb, 2014, at 10:08 am, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, While running the Bagel’s Wikipedia Page Rank example (org.apache.spark.examples.bagel.WikipediaPageRank), it is having this error at the end: org.apache.spark.SparkException: Job aborted: Task 3.0:4 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.examples.bagel.PRCombiner) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/02/04 09:56:34 INFO TaskSetManager: Starting task 4.0:2 as TID 170 on executor 0: oct1 (NODE_LOCAL) 14/02/04 09:56:34 INFO TaskSetManager: Serialized task 4.0:2 as 1754 bytes in 0 ms 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING from TID 147 because its task set is gone 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING from TID 148 because its task set is gone 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state FAILED from TID 147 because its task set is gone 14/02/04 09:56:34 INFO TaskSetManager: Starting task 4.0:3 as TID 171 on executor 2: oct2 (NODE_LOCAL) 14/02/04 09:56:34 INFO TaskSetManager: Serialized task 4.0:3 as 1754 bytes in 0 ms 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING from TID 149 because its task set is gone 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING from TID 150 because its task set is gone 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state FAILED from TID 148 because its task set is gone I’m using 0.9.0. Thanks! The stdout is available here: http://pastebin.com/T1wYB9mh There are a few more exceptions during the run: java.lang.ClassNotFoundException: org.apache.spark.examples.bagel.WikipediaPageRank$$anonfun$1 Regards, Liming
Re: [CentOS] Cloud Computing
Bogdan Nicolescu wrote: - Original Message From: Tsai Li Ming lt...@osgdc.org To: CentOS mailing list centos@centos.org Sent: Monday, July 20, 2009 12:18:26 AM Subject: Re: [CentOS] Cloud Computing Hi, Bogdan Nicolescu wrote: - Original Message From: Ryan J M To: CentOS mailing list Sent: Saturday, July 18, 2009 8:59:02 AM Subject: Re: [CentOS] Cloud Computing On Sat, Jul 18, 2009 at 4:36 AM, Mattwrote: Is anyone creating a cloud based on Centos yet? Ubuntu seems to be quite active there: http://www.ubuntu.com/products/whatisubuntu/serveredition/cloud/uec So there still has no CENTOS HPC solution provided yet, has upstreamer disclosed the source? ftp://ftp.redhat.com/pub/redhat/linux/beta/RHHPC still not accessable. If the centos community wish to obtain the srpms directly from us, I can provide them as rhhpc srpms are obtained from us. As expressed before to the community here and to KB, we are willing to help build and contribute to the CentOS HPC SIG. -Liming ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos Yes, please do provide the srpms thanks bn Also, we are going to start preparing ours to work with RHEL 5.4 when it is out in the coming months. Can the community wait till our 5.4 compatible version is ready. This may coincide with the Centos 5.4 release. -Liming ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Cloud Computing
Karanbir Singh wrote: Tsai Li Ming wrote: Also, we are going to start preparing ours to work with RHEL 5.4 when it is out in the coming months. Can the community wait till our 5.4 compatible version is ready. This may coincide with the Centos 5.4 release. The last time we had this conversation there was an issue with 'your srpms' are really not the 'red hat' srpms. Has this situation changed ? KB, Our srpms[1] are given to Red Hat and thus are being rebuild by them. EPEL srpms are not given because RH takes them directly from their own epel builds. Till date, RH has not released the srpms. Community request is certainly helpful here. If you download the srpm from rhn and compare against ours, it is not the same. The md5sum will not be the same because the srpms are generated by their build system using ours. Each srpm has a redhat buildhost, signed by them, etc. However, the content is the same. If it's a centos policy to strictly use rh srpms, then we would be better off asking RH to release them to the community. Kusu/PCM is GPL v2. -Liming [1] PCM RHHPC edition srpms, since PCM has various editions. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Cloud Computing
Hi, Bogdan Nicolescu wrote: - Original Message From: Ryan J M sync@gmail.com To: CentOS mailing list centos@centos.org Sent: Saturday, July 18, 2009 8:59:02 AM Subject: Re: [CentOS] Cloud Computing On Sat, Jul 18, 2009 at 4:36 AM, Mattwrote: Is anyone creating a cloud based on Centos yet? Ubuntu seems to be quite active there: http://www.ubuntu.com/products/whatisubuntu/serveredition/cloud/uec So there still has no CENTOS HPC solution provided yet, has upstreamer disclosed the source? ftp://ftp.redhat.com/pub/redhat/linux/beta/RHHPC still not accessable. snip If the centos community wish to obtain the srpms directly from us, I can provide them as rhhpc srpms are obtained from us. As expressed before to the community here and to KB, we are willing to help build and contribute to the CentOS HPC SIG. -Liming ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Set hostname via DHCP ?
Niki Kovacs wrote: Niki Kovacs a écrit : If I take a look at /var/lib/dhclient/dhclient-eth0.leases (on the client), here's a summary of the lease: lease { interface eth0; fixed-address 192.168.1.2; option subnet-mask 255.255.255.0; option routers 192.168.1.254; option dhcp-lease-time 86400; option dhcp-message-type 5; option domain-name-servers 62.4.16.70,62.4.17.69; option dhcp-server-identifier 192.168.1.252; option broadcast-address 192.168.1.255; option host-name raymonde; option domain-name local; renew 1 2009/6/29 17:04:30; rebind 2 2009/6/30 04:47:44; expire 2 2009/6/30 07:47:44; } Here's what 'hostname' returns: # hostname raymonde But when I go for the domain name, I get this: # hostname -d hostname: Hôte inconnu -- means 'Unknown host' in french :o) Any idea what I'm doing wrong here? OK, I think I got my mistake. When I specify a domain name (with 'option domain-name'), this is in fact what gets written to the client's /etc/resolv.conf in the 'search' line. But to handle the fully qualified domain name centrally, I have to use DNS. Correct me if I'm wrong. In our environment, our dhcp client machines (rhel or centos) gets their hostname by dns lookups (Not sure why). Our dhcp server generally just provides the usual configurations like dns/search optionse/etc. The ip addresses for our dhcp range have both forward and reverse mappings on our dns server and the clients obtain the proper fully qualified host name upon dhcp request. HTH, Liming ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] filesystem rpm fails when /home is NFS mounted
Scott Silva wrote: on 4-2-2009 2:00 PM Anne Wilson spake the following: On Thursday 02 April 2009 21:40:59 R P Herrold wrote: On Wed, 1 Apr 2009, Paul Heinlein wrote: I don't know if it's a bug or a feature, but the filesystem-2.4.0-2.el5.centos rpm won't upgrade cleanly if /home is an NFS filesystem. I confirm this is present in 5.3 where /home is an NFS mount, and that I missed it in testing. A workaround is: 1. Boot into single user node. 2. run: /sbin/service network start 3. run: yum -y update filesystem If your system emitted the warning, but did not 'bail', it is safe to retieve the rpm locally, and to run: # rpm -Uvh filesystem*rpm --force as there are no scripts in play: [herr...@centos-5 ~]$ sudo rpm -q --scripts filesystem [herr...@centos-5 ~]$ The cause is the NFS root_squash being in effect when a NFS overmount is on a mountpoint, it seems. /home happens to express it It seems Paul and I are the last two users of NFS mounted /home left. I have /home exported and ran the upgrade from this laptop over the network, where that directory is mounted and displayed in a folderview under KDE4. I had no problems whatsoever. Is this the sort of situation you mean? Anne The way I read it was their /home was mounted on NFS, not just exported. I had a problem with /mnt or /media too with a mounted ISO. Had to umount the ISO before filesystem rpm can be updated. This happened when I yum update to RHEL 5.3 recently. -Liming ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] filesystem rpm fails when /home is NFS mounted
R P Herrold wrote: Thank you for the confirmation, I have not had a chance to file in the centos tracker yet, and hope to get it filed tomorrow's business hours. Similarly I have not checked upstream's tracker yet. If needee, I'll file there as well, but I cannot imagine it will be deemed so critical as to pull a fast fix. Found this on upstream bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=483071 -Liming ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [Kusu-users] repopatch
Hi J, Which kernel updates are you only interested in? There are entries in the database that are related to the updated kernels. fyi, kernel-xen is not used right now. -Liming Jay wrote: How do I select only some of the kernel updates found when running repopatch? Or do I have to disect the update kit after the fact? Thanks, J __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Kusu-users mailing list Kusu-users@osgdc.org http://mail.osgdc.org/mailman/listinfo/kusu-users ___ Kusu-users mailing list Kusu-users@osgdc.org http://mail.osgdc.org/mailman/listinfo/kusu-users
[Mailman-Users] Getting notified for subscription bounce
Hi, Is it possible for the list owner to get a bounce when the confirmation emails does not get sent out to the subscribers, usually by a bounce or 550. I have the following logs in my postfix but the owner is not getting any bounces: Nov 11 13:43:46 mail postfix/local[19345]: 313C22FED3: to=[EMAIL PROTECTED], orig_to=[EMAIL PROTECTED], relay=local, delay=0, status=sent (delivered to command: /usr/lib/mailman/mail/mailman bounces listname) Nov 11 13:43:46 mail postfix/qmgr[19329]: 313C22FED3: removed I have the following settings in my bounce processing section: bounce processing: Yes bounce_score_threshold: 5.0 bounce_info_stale_after: 7 bounce_you_are_disabled_warnings: 3 bounce_you_are_disabled_warnings_interval: 7 bounce_unrecognized_goes_to_list_owner: Yes bounce_notify_owner_on_disable: Yes bounce_notify_owner_on_removal: Yes Sending to [EMAIL PROTECTED] does generate an Uncaught bounce notification Regards, Liming -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp
dns lookup for reverse proxy
Dear all, I have the following directives in my conf file. ifmodule mod_proxy.c proxyrequests off RewriteEngine On ProxyPass /Server/ http://localhost:8081 ProxyPassReverse /Server/ http://localhost:8081 RewriteRule ^/Server$ /Server/ [P] /IfModule My error log: [Wed Sep 22 02:59:41 2004] [error] [client 192.168.1.22] proxy: DNS lookup failure for: example2 returned by /Server Does a force proxy requires a dns lookup on the httpd server itself? When I change from a [P] to a [R], it works but our client does not understand a http 302 (axis client). -Liming
Re: dns lookup for reverse proxy
It didn't work, because it's not under the DocumentRoot. [Wed Sep 22 03:30:14 2004] [error] [client 192.168.1.22] File does not exist: /var/www/html/Server -Liming Ian Holsman wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 try something like ifmodule mod_proxy.c ProxyPass /Server/ http://localhost:8081/ ProxyPassReverse /Server/ http://localhost:8081/ /ifmodule you shouldn't need the other stuff to make it work On 22/09/2004, at 1:07 PM, Tsai Li Ming wrote: Dear all, I have the following directives in my conf file. ifmodule mod_proxy.c proxyrequests off RewriteEngine On ProxyPass /Server/ http://localhost:8081 ProxyPassReverse /Server/ http://localhost:8081 RewriteRule ^/Server$ /Server/ [P] /IfModule My error log: [Wed Sep 22 02:59:41 2004] [error] [client 192.168.1.22] proxy: DNS lookup failure for: example2 returned by /Server Does a force proxy requires a dns lookup on the httpd server itself? When I change from a [P] to a [R], it works but our client does not understand a http 302 (axis client). -Liming - -- Ian Holsman Director Network Management Systems CNET Networks PH: 415-344-2608 (USA) /(++61) 3-9818-0132 (Australia) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (Darwin) iD8DBQFBUPElq3pgvCz4ZCcRAqP8AJ9bm2/Lvdqbg3Y+vEOIsXl+hvC/ngCfSM8e HDDFrCA18WEQXIdwWWOj4AA= =Qf5L -END PGP SIGNATURE-
[Samba] cp input/output error
Hi I have been getting random input/oput error when trying to cp a ISO (100mb) to a samba mount point. I get the same random error when I try to cp a txt file over too. cp: writing `/public/cd.iso': Input/output error my fstab: //fserv/public /public smbfs fmask=666,username=,password= 1 Thanks, Liming -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba