from:"Tsai Li Ming"

[jira] [Comment Edited] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-05-03 Thread Tsai Li Ming (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675
 ] 

Tsai Li Ming edited comment on SPARK-15039 at 5/3/16 1:16 PM:
--

[~zsxwing] Nothing suspicious in the logs. The streaming tab has 1 receiver but 
has 0 events/sec

[~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. 

I will test in Standalone mode.


was (Author: ltsai):
[~zsxwing] Nothing suspcisious in the logs. The streaming tab has 1 receiver 
but has 0 events/sec

[~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. 



> Kinesis reciever does not work in Yarn
> --
>
> Key: SPARK-15039
> URL: https://issues.apache.org/jira/browse/SPARK-15039
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: YARN
> HDP 2.4.0
>Reporter: Tsai Li Ming
>
> Hi,
> Using the pyspark kinesis example, it does not receive any messages from 
> Kinesis when submitting to a YARN cluster, though it is working fine when 
> using local mode. 
> {code}
> spark-submit \
> --executor-cores 4 \
> --num-executors 4 \
> --packages 
> com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
>  
> {code}
> I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 
> Not sure whether this is related to SPARK-12453



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-05-03 Thread Tsai Li Ming (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675
 ] 

Tsai Li Ming commented on SPARK-15039:
--

[~zsxwing] Nothing suspcisious in the logs. The streaming tab has 1 receiver 
but has 0 events/sec

[~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. 



> Kinesis reciever does not work in Yarn
> --
>
> Key: SPARK-15039
> URL: https://issues.apache.org/jira/browse/SPARK-15039
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: YARN
> HDP 2.4.0
>Reporter: Tsai Li Ming
>
> Hi,
> Using the pyspark kinesis example, it does not receive any messages from 
> Kinesis when submitting to a YARN cluster, though it is working fine when 
> using local mode. 
> {code}
> spark-submit \
> --executor-cores 4 \
> --num-executors 4 \
> --packages 
> com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
>  
> {code}
> I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 
> Not sure whether this is related to SPARK-12453



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-04-30 Thread Tsai Li Ming (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsai Li Ming updated SPARK-15039:
-
Description: 
Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working fine when using 
local mode. 

{code}
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
{code}

I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 

Not sure whether this is related to SPARK-12453

  was:
Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working fine when using 
local mode. 

{code}
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
{code}

I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 


> Kinesis reciever does not work in Yarn
> --
>
> Key: SPARK-15039
> URL: https://issues.apache.org/jira/browse/SPARK-15039
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: YARN
> HDP 2.4.0
>Reporter: Tsai Li Ming
>
> Hi,
> Using the pyspark kinesis example, it does not receive any messages from 
> Kinesis when submitting to a YARN cluster, though it is working fine when 
> using local mode. 
> {code}
> spark-submit \
> --executor-cores 4 \
> --num-executors 4 \
> --packages 
> com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
>  
> {code}
> I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 
> Not sure whether this is related to SPARK-12453



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-04-30 Thread Tsai Li Ming (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsai Li Ming updated SPARK-15039:
-
Description: 
Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working fine when using 
local mode. 

{code}
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
{code}

I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 

  was:
Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working when using 
local mode. 

```
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
```

I had to downgrade the package to 1.5.1 before it can work. 


> Kinesis reciever does not work in Yarn
> --
>
> Key: SPARK-15039
> URL: https://issues.apache.org/jira/browse/SPARK-15039
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: YARN
> HDP 2.4.0
>Reporter: Tsai Li Ming
>
> Hi,
> Using the pyspark kinesis example, it does not receive any messages from 
> Kinesis when submitting to a YARN cluster, though it is working fine when 
> using local mode. 
> {code}
> spark-submit \
> --executor-cores 4 \
> --num-executors 4 \
> --packages 
> com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
>  
> {code}
> I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-04-30 Thread Tsai Li Ming (JIRA)

Tsai Li Ming created SPARK-15039:


 Summary: Kinesis reciever does not work in Yarn
 Key: SPARK-15039
 URL: https://issues.apache.org/jira/browse/SPARK-15039
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.6.0
 Environment: YARN
HDP 2.4.0
Reporter: Tsai Li Ming


Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working when using 
local mode. 

```
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
```

I had to downgrade the package to 1.5.1 before it can work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel

2016-02-10 Thread Tsai Li Ming (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623
 ] 

Tsai Li Ming commented on SPARK-3220:
-

I built Derrick's kmeans against Spark 1.6.0 and ran

{code}
import com.massivedatascience.clusterer.KMeans
val clusters = KMeans.train(parsedData, numClusters, numIterations)
{code}

It took 41mins with the same dataset/settings compared to 1hr using Mllib. In 
both cases, there was enough memory to cache everything.

> K-Means clusterer should perform K-Means initialization in parallel
> ---
>
> Key: SPARK-3220
> URL: https://issues.apache.org/jira/browse/SPARK-3220
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Derrick Burns
>  Labels: clustering
>
> The LocalKMeans method should be replaced with a parallel implementation.  As 
> it stands now, it becomes a bottleneck for large data sets. 
> I have implemented this functionality in my version of the clusterer.  
> However, I see that there are hundreds of outstanding pull requests.  If 
> someone on the team wants to sponsor the pull request, I will create one.  
> Otherwise, I will just maintain my own private fork of the clusterer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel

2016-02-10 Thread Tsai Li Ming (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623
 ] 

Tsai Li Ming edited comment on SPARK-3220 at 2/10/16 11:01 AM:
---

I built Derrick's kmeans against Spark 1.6.0 and ran

{code}
import com.massivedatascience.clusterer.KMeans
val clusters = KMeans.train(parsedData, numClusters, numIterations)
{code}

It took 41mins with the same dataset/settings compared to 1hr using Mllib, 
though it slowed down during _reduceByKeyLocally_ phase. In both cases, there 
was enough memory to cache everything.


was (Author: ltsai):
I built Derrick's kmeans against Spark 1.6.0 and ran

{code}
import com.massivedatascience.clusterer.KMeans
val clusters = KMeans.train(parsedData, numClusters, numIterations)
{code}

It took 41mins with the same dataset/settings compared to 1hr using Mllib. In 
both cases, there was enough memory to cache everything.

> K-Means clusterer should perform K-Means initialization in parallel
> ---
>
> Key: SPARK-3220
> URL: https://issues.apache.org/jira/browse/SPARK-3220
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Derrick Burns
>  Labels: clustering
>
> The LocalKMeans method should be replaced with a parallel implementation.  As 
> it stands now, it becomes a bottleneck for large data sets. 
> I have implemented this functionality in my version of the clusterer.  
> However, I see that there are hundreds of outstanding pull requests.  If 
> someone on the team wants to sponsor the pull request, I will create one.  
> Otherwise, I will just maintain my own private fork of the clusterer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel

2016-02-09 Thread Tsai Li Ming (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140434#comment-15140434
 ] 

Tsai Li Ming commented on SPARK-3220:
-

[~derrickburns], Is your private fork at 
https://github.com/derrickburns/generalized-kmeans-clustering ?

I am having the same problem here:
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Kmeans-using-1-core-only-Was-Slowness-in-Kmeans-calculating-fastSquaredDistance-td16304.html



> K-Means clusterer should perform K-Means initialization in parallel
> ---
>
> Key: SPARK-3220
> URL: https://issues.apache.org/jira/browse/SPARK-3220
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Derrick Burns
>  Labels: clustering
>
> The LocalKMeans method should be replaced with a parallel implementation.  As 
> it stands now, it becomes a bottleneck for large data sets. 
> I have implemented this functionality in my version of the clusterer.  
> However, I see that there are hundreds of outstanding pull requests.  If 
> someone on the team wants to sponsor the pull request, I will create one.  
> Otherwise, I will just maintain my own private fork of the clusterer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

The usage of OpenBLAS

2015-06-26 Thread Tsai Li Ming

Hi,

I found out that the instructions for OpenBLAS has been changed by the author 
of netlib-java in:
https://github.com/apache/spark/pull/4448 since Spark 1.3.0

In that PR, I asked whether there’s still a need to compile OpenBLAS with 
USE_THREAD=0, and also about Intel MKL.

Is it still applicable or no longer the case anymore?

Thanks,
Liming


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Issues building 1.4.0 using make-distribution

2015-06-17 Thread Tsai Li Ming

Hi,

I downloaded the source from Downloads page and ran the make-distribution.sh 
script.

# ./make-distribution.sh --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests 
clean package

The script has “-x” set in the beginning.

++ /tmp/a/spark-1.4.0/build/mvn help:evaluate -Dexpression=project.version 
-Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
++ grep -v INFO
++ tail -n 1
+ VERSION='[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin'
++ /tmp/a/spark-1.4.0/build/mvn help:evaluate -Dexpression=scala.binary.version 
-Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
++ grep -v INFO
++ tail -n 1
+ SCALA_VERSION='[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin'
++ /tmp/a/spark-1.4.0/build/mvn help:evaluate -Dexpression=hadoop.version 
-Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
++ grep -v INFO
++ tail -n 1

…

+ TARDIR_NAME='spark-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin'
+ TARDIR='/tmp/a/spark-1.4.0/spark-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin'
+ rm -rf '/tmp/a/spark-1.4.0/spark-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin'
+ cp -r /tmp/a/spark-1.4.0/dist '/tmp/a/spark-1.4.0/spark-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin'
cp: cannot create directory `/tmp/a/spark-1.4.0/spark-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-[WARNING] See 
http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin': No such file or 
directory


The dist directory seems complete and does work.

Thanks,
Liming



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Documentation for external shuffle service in 1.4.0

2015-06-17 Thread Tsai Li Ming

Hi,

I can’t seem to find any documentation on this feature in 1.4.0?

Regards,
Liming


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Not getting event logs = spark 1.3.1

2015-06-16 Thread Tsai Li Ming

Forgot to mention this is on standalone mode.

Is my configuration wrong?

Thanks,
Liming

On 15 Jun, 2015, at 11:26 pm, Tsai Li Ming mailingl...@ltsai.com wrote:

 Hi,
 
 I have this in my spark-defaults.conf (same for hdfs):
 spark.eventLog.enabled  true
 spark.eventLog.dir  file:/tmp/spark-events
 spark.history.fs.logDirectory   file:/tmp/spark-events
 
 While the app is running, there is a “.inprogress” directory. However when 
 the job completes, the directory is always empty.
 
 I’m submitting the job like this, using either the Pi or world count examples:
 $ bin/spark-submit 
 /opt/spark-1.4.0-bin-hadoop2.6/examples/src/main/python/wordcount.py 
 
 This used to be working in 1.2.1 and didn’t test 1.3.0.
 
 
 Regards,
 Liming
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Not getting event logs = spark 1.3.1

2015-06-15 Thread Tsai Li Ming

Hi,

I have this in my spark-defaults.conf (same for hdfs):
spark.eventLog.enabled  true
spark.eventLog.dir  file:/tmp/spark-events
spark.history.fs.logDirectory   file:/tmp/spark-events

While the app is running, there is a “.inprogress” directory. However when the 
job completes, the directory is always empty.

I’m submitting the job like this, using either the Pi or world count examples:
$ bin/spark-submit 
/opt/spark-1.4.0-bin-hadoop2.6/examples/src/main/python/wordcount.py 

This used to be working in 1.2.1 and didn’t test 1.3.0.


Regards,
Liming






-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Logstash as a source?

2015-02-01 Thread Tsai Li Ming

I have been using a logstash alternative - fluentd to ingest the data into hdfs.

I had to configure fluentd to not append the data so that spark streaming will 
be able to pick up the new logs.

-Liming


On 2 Feb, 2015, at 6:05 am, NORD SC jan.algermis...@nordsc.com wrote:

 Hi,
 
 I plan to have logstash send log events (as key value pairs) to spark 
 streaming using Spark on Cassandra.
 
 Being completely fresh to Spark, I have a couple of questions:
 
 - is that a good idea at all, or would it be better to put e.g. Kafka in 
 between to handle traffic peeks
  (IOW: how and how well would Spark Streaming handle peeks?)
 
 - Is there already a logstash-source implementation for Spark Streaming 
 
 - assuming there is none yet and assuming it is a good idea: I’d dive into 
 writing it myself - what would the core advice be to avoid biginner traps?
 
 Jan
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Confused why I'm losing workers/executors when writing a large file to S3

2015-01-21 Thread Tsai Li Ming

I’m getting the same issue on Spark 1.2.0. Despite having set 
“spark.core.connection.ack.wait.timeout” in spark-defaults.conf and verified in 
the job UI (port 4040) environment tab, I still get the “no heartbeat in 60 
seconds” error. 

spark.core.connection.ack.wait.timeout=3600

15/01/22 07:29:36 WARN master.Master: Removing 
worker-20150121231529-numaq1-4-34948 because we got no heartbeat in 60 seconds


On 14 Nov, 2014, at 3:04 pm, Reynold Xin r...@databricks.com wrote:

 Darin,
 
 You might want to increase these config options also:
 
 spark.akka.timeout 300
 spark.storage.blockManagerSlaveTimeoutMs 30
 
 On Thu, Nov 13, 2014 at 11:31 AM, Darin McBeath ddmcbe...@yahoo.com.invalid 
 wrote:
 For one of my Spark jobs, my workers/executors are dying and leaving the 
 cluster.
 
 On the master, I see something like the following in the log file.  I'm 
 surprised to see the '60' seconds in the master log below because I 
 explicitly set it to '600' (or so I thought) in my spark job (see below).   
 This is happening at the end of my job when I'm trying to persist a large RDD 
 (probably around 300+GB) back to S3 (in 256 partitions).  My cluster consists 
 of 6 r3.8xlarge machines.  The job successfully works when I'm outputting 
 100GB or 200GB.
 
 If  you have any thoughts/insights, it would be appreciated. 
 
 Thanks.
 
 Darin.
 
 Here is where I'm setting the 'timeout' in my spark job.
 
 SparkConf conf = new SparkConf()
 .setAppName(SparkSync Application)
 .set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
 .set(spark.rdd.compress,true)   
 .set(spark.core.connection.ack.wait.timeout,600);
 
 On the master, I see the following in the log file.
 
 4/11/13 17:20:39 WARN master.Master: Removing 
 worker-20141113134801-ip-10-35-184-232.ec2.internal-51877 because we got no 
 heartbeat in 60 seconds
 14/11/13 17:20:39 INFO master.Master: Removing worker 
 worker-20141113134801-ip-10-35-184-232.ec2.internal-51877 on 
 ip-10-35-184-232.ec2.internal:51877
 14/11/13 17:20:39 INFO master.Master: Telling app of lost executor: 2
 
 On a worker, I see something like the following in the log file.
 
 14/11/13 17:20:58 WARN util.AkkaUtils: Error sending message in 1 attempts
 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
   at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
   at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
   at scala.concurrent.Await$.result(package.scala:107)
   at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
   at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:362)
 14/11/13 17:21:11 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.net.SocketException) caught when processing request: Broken pipe
 14/11/13 17:21:11 INFO httpclient.HttpMethodDirector: Retrying request
 14/11/13 17:21:32 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.net.SocketException) caught when processing request: Broken pipe
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.io.IOException) caught when processing request: Resetting to invalid 
 mark
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.io.IOException) caught when processing request: Resetting to invalid 
 mark
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.io.IOException) caught when processing request: Resetting to invalid 
 mark
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.io.IOException) caught when processing request: Resetting to invalid 
 mark
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.io.IOException) caught when processing request: Resetting to invalid 
 mark
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.io.IOException) caught when processing request: Resetting to invalid 
 mark
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.io.IOException) caught when processing request: Resetting to invalid 
 mark
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: Retrying request
 14/11/13 17:21:34 INFO httpclient.HttpMethodDirector: I/O exception 
 (java.io.IOException) caught when processing request: Resetting to invalid 
 mark
 14/11/13 17:21:34 INFO

Understanding stages in WebUI

2014-11-25 Thread Tsai Li Ming

Hi,

I have the classic word count example:
 file.flatMap(line = line.split( )).map(word = (word,1)).reduceByKey(_ + 
 _).collect()

From the Job UI, I can only see 2 stages: 0-collect and 1-map.

What happened to ShuffledRDD in reduceByKey? And both flatMap and map 
operations is collapsed into a single stage?

14/11/25 16:02:35 INFO SparkContext: Starting job: collect at console:15
14/11/25 16:02:35 INFO DAGScheduler: Registering RDD 6 (map at console:15)
14/11/25 16:02:35 INFO DAGScheduler: Got job 0 (collect at console:15) with 2 
output partitions (allowLocal=false)
14/11/25 16:02:35 INFO DAGScheduler: Final stage: Stage 0(collect at 
console:15)
14/11/25 16:02:35 INFO DAGScheduler: Parents of final stage: List(Stage 1)
14/11/25 16:02:35 INFO DAGScheduler: Missing parents: List(Stage 1)
14/11/25 16:02:35 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[6] at map at 
console:15), which has no missing parents
14/11/25 16:02:35 INFO MemoryStore: ensureFreeSpace(3464) called with 
curMem=163705, maxMem=278302556
14/11/25 16:02:35 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 3.4 KB, free 265.3 MB)
14/11/25 16:02:35 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 
(MappedRDD[6] at map at console:15)
14/11/25 16:02:35 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/11/25 16:02:35 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, 
localhost, PROCESS_LOCAL, 1208 bytes)
14/11/25 16:02:35 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, 
localhost, PROCESS_LOCAL, 1208 bytes)
14/11/25 16:02:35 INFO Executor: Running task 0.0 in stage 1.0 (TID 0)
14/11/25 16:02:35 INFO Executor: Running task 1.0 in stage 1.0 (TID 1)
14/11/25 16:02:35 INFO HadoopRDD: Input split: 
file:/Users/ltsai/Downloads/spark-1.1.0-bin-hadoop2.4/README.md:0+2405
14/11/25 16:02:35 INFO HadoopRDD: Input split: 
file:/Users/ltsai/Downloads/spark-1.1.0-bin-hadoop2.4/README.md:2405+2406
14/11/25 16:02:35 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
mapreduce.task.id
14/11/25 16:02:35 INFO deprecation: mapred.task.id is deprecated. Instead, use 
mapreduce.task.attempt.id
14/11/25 16:02:35 INFO deprecation: mapred.task.is.map is deprecated. Instead, 
use mapreduce.task.ismap
14/11/25 16:02:35 INFO deprecation: mapred.job.id is deprecated. Instead, use 
mapreduce.job.id
14/11/25 16:02:35 INFO deprecation: mapred.task.partition is deprecated. 
Instead, use mapreduce.task.partition
14/11/25 16:02:36 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 1869 
bytes result sent to driver
14/11/25 16:02:36 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 1869 
bytes result sent to driver
14/11/25 16:02:36 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) 
in 536 ms on localhost (1/2)
14/11/25 16:02:36 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) 
in 529 ms on localhost (2/2)
14/11/25 16:02:36 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have 
all completed, from pool 
14/11/25 16:02:36 INFO DAGScheduler: Stage 1 (map at console:15) finished in 
0.562 s
14/11/25 16:02:36 INFO DAGScheduler: looking for newly runnable stages
14/11/25 16:02:36 INFO DAGScheduler: running: Set()
14/11/25 16:02:36 INFO DAGScheduler: waiting: Set(Stage 0)
14/11/25 16:02:36 INFO DAGScheduler: failed: Set()
14/11/25 16:02:36 INFO DAGScheduler: Missing parents for Stage 0: List()
14/11/25 16:02:36 INFO DAGScheduler: Submitting Stage 0 (ShuffledRDD[7] at 
reduceByKey at console:15), which is now runnable
14/11/25 16:02:36 INFO MemoryStore: ensureFreeSpace(2112) called with 
curMem=167169, maxMem=278302556
14/11/25 16:02:36 INFO MemoryStore: Block broadcast_2 stored as values in 
memory (estimated size 2.1 KB, free 265.2 MB)
14/11/25 16:02:36 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
(ShuffledRDD[7] at reduceByKey at console:15)
14/11/25 16:02:36 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/11/25 16:02:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, 
localhost, PROCESS_LOCAL, 948 bytes)
14/11/25 16:02:36 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, 
localhost, PROCESS_LOCAL, 948 bytes)
14/11/25 16:02:36 INFO Executor: Running task 0.0 in stage 0.0 (TID 2)
14/11/25 16:02:36 INFO Executor: Running task 1.0 in stage 0.0 (TID 3)
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: 
maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: 
maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 
2 non-empty blocks out of 2 blocks
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 
2 non-empty blocks out of 2 blocks
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 
0 remote fetches in 5 ms
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 
0

RDD memory and storage level option

2014-11-20 Thread Tsai Li Ming

Hi,

This is on version 1.1.0.

I’m did a simple test on MEMORY_AND_DISK storage level.

 var file = 
 sc.textFile(“file:///path/to/file.txt”).persit(StorageLevel.MEMORY_AND_DISK)
 file.count()

The file is 1.5GB and there is only 1 worker. I have requested for 1GB of 
worker memory per node:

  
 ID   Name Cores Memory per Node   Submitted Time   
 User  State  Duration
   app-20141120193912-0002 Spark shell 641024.0 MB   2014/11/20 
19:39:12 root RUNNING 6.0 min 


After doing a simple count, the job web ui indicates the entire file is saved 
on disk?

   RDD NameStorage Level Cached 
Fraction  Size in Size in Size on 
   Partitions
Cached   Memory  Tachyon   Disk  
   file:///path/to/file.txt Disk Serialized 1x 46   
100%   0.0 B   0.0 B1476.5 MB
 Replicated 
  
 
1. Shouldn’t some partitions be saved into memory? 




2. If I run with MEMORY_ONLY option, I can save some partitions into memory but 
there are still space left according to the executor page
220.6 MB / 530.3MB and it did not fully use up them? Each partition is about 
73MB.

  RDD Name  Storage Level  Cached
Fraction  Size in Size inSize on
  Partitions   
Cached   Memory  Tachyon  Disk 
   file:///path/to/file.txt Memory Deserialized  3
7%220.6 MB0.0 B0.0 B  
 1x Replicated  
  
  
ExecutorAddress  RDD MemoryDisk   Active   Failed   
CompleteTotal   Task   Input  Shuffle  Shuffle
   ID   BlocksUsed Used   TasksTasks  Tasks 
Tasks   TimeReadWrite 
220.6 MB
  1457.4MB  
   0  foo.co:48660 3/ 530.3   0.0 B  0046   
   46  14.2 m 0.0 B0.0 B  
MB

14/11/20 19:53:22 INFO BlockManagerInfo: Added rdd_1_22 in memory on 
foo.co:48660 (size: 73.6 MB, free: 309.6 MB)
14/11/20 19:53:22 INFO TaskSetManager: Finished task 22.0 in stage 0.0 (TID 22) 
in 29833 ms on foo.co (43/46)
14/11/20 19:53:24 INFO TaskSetManager: Finished task 33.0 in stage 0.0 (TID 33) 
in 31502 ms on foo.co (44/46)
14/11/20 19:53:24 INFO TaskSetManager: Finished task 24.0 in stage 0.0 (TID 24) 
in 31651 ms on foo.co (45/46)
14/11/20 19:53:24 INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) 
in 31782 ms on foo.co (46/46)
14/11/20 19:53:24 INFO DAGScheduler: Stage 0 (count at console:16) finished 
in 31.818 s
14/11/20 19:53:24 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have 
all completed, from pool 
14/11/20 19:53:24 INFO SparkContext: Job finished: count at console:16, took 
31.926585742 s
res0: Long = 1000

Is this correct?



3. I can’t seem to work out the math to derive 530MB that is made available in 
the executor? 1024MB * memoryFraction(0.6) = 614.4

Thanks!





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-09-12 Thread Tsai Li Ming

Another observation I had was reading over local filesystem with “file://“. it 
was stated as PROCESS_LOCAL which was confusing. 

Regards,
Liming

On 13 Sep, 2014, at 3:12 am, Nicholas Chammas nicholas.cham...@gmail.com 
wrote:

 Andrew,
 
 This email was pretty helpful. I feel like this stuff should be summarized in 
 the docs somewhere, or perhaps in a blog post.
 
 Do you know if it is?
 
 Nick
 
 
 On Thu, Jun 5, 2014 at 6:36 PM, Andrew Ash and...@andrewash.com wrote:
 The locality is how close the data is to the code that's processing it.  
 PROCESS_LOCAL means data is in the same JVM as the code that's running, so 
 it's really fast.  NODE_LOCAL might mean that the data is in HDFS on the same 
 node, or in another executor on the same node, so is a little slower because 
 the data has to travel across an IPC connection.  RACK_LOCAL is even slower 
 -- data is on a different server so needs to be sent over the network.
 
 Spark switches to lower locality levels when there's no unprocessed data on a 
 node that has idle CPUs.  In that situation you have two options: wait until 
 the busy CPUs free up so you can start another task that uses data on that 
 server, or start a new task on a farther away server that needs to bring data 
 from that remote place.  What Spark typically does is wait a bit in the hopes 
 that a busy CPU frees up.  Once that timeout expires, it starts moving the 
 data from far away to the free CPU.
 
 The main tunable option is how far long the scheduler waits before starting 
 to move data rather than code.  Those are the spark.locality.* settings here: 
 http://spark.apache.org/docs/latest/configuration.html
 
 If you want to prevent this from happening entirely, you can set the values 
 to ridiculously high numbers.  The documentation also mentions that 0 has 
 special meaning, so you can try that as well.
 
 Good luck!
 Andrew
 
 
 On Thu, Jun 5, 2014 at 3:13 PM, Sung Hwan Chung coded...@cs.stanford.edu 
 wrote:
 I noticed that sometimes tasks would switch from PROCESS_LOCAL (I'd assume 
 that this means fully cached) to NODE_LOCAL or even RACK_LOCAL.
 
 When these happen things get extremely slow.
 
 Does this mean that the executor got terminated and restarted?
 
 Is there a way to prevent this from happening (barring the machine actually 
 going down, I'd rather stick with the same process)?

[slurm-dev] Cyclic distribution problem

2014-06-30 Thread Tsai Li Ming


Hi,

I’m running 2 slurmds on a single host (built with --enable-multiple-slurmd). 
The total cpus are divided equally among the 2 nodes.

I’m trying to test the distribution modes=block/cyclic but the tasks are always 
allocated on the first node unless I use --ntasks-per-node=1

$ srun -n2 --distribution=block/cyclic sleep 100

I’m using:
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_MEMORY
NodeName=ltsai-dev-rhel7-1 NodeHostname=ltsai-dev-rhel7 Port=17001 Sockets=1 
CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1841 State=UNKNOWN
NodeName=ltsai-dev-rhel7-2 NodeHostname=ltsai-dev-rhel7 Port=17002 Sockets=1 
CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1841 State=UNKNOWN
PartitionName=compute Nodes=ltsai-dev-rhel7-[1-2] Default=YES MaxTime=INFINITE 
State=UP

Did I misconfigure something?

Thanks!=

[slurm-dev] Reserved Partition name?

2014-04-05 Thread Tsai Li Ming


Hi,

I am using the following partition name DEFAULT/default but slurmctld is not 
able to start.

NodeName=compute State=UNKNOWN
PartitionName=default Nodes=compute Default=YES MaxTime=INFINITE State=UP

slurmctld: debug:  Reading slurm.conf file: /opt/slurm-14.03.0/etc/slurm.conf
slurmctld: topology NONE plugin loaded
slurmctld: debug:  No DownNodes
slurmctld: fatal: No PartitionName information available!

I can use other names such as “debug” or “compute”.

This is Slurm version 14.03.

Thanks!

[slurm-dev] Mismatch in cpu configuration

2014-04-05 Thread Tsai Li Ming


Hi,

I’m testing slurm on my vm.

My compute node is defined in slurmd.conf without any CPU/Socket/Core/Thread 
information:
NodeName=compute State=UNKNOWN

# ./slurmd -C
ClusterName=(null) NodeName=compute CPUs=2 Boards=1 SocketsPerBoard=1 
CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1463 TmpDisk=14649
UpTime=0-07:10:57

However, when I start slurmd:
# ./slurmd -D -vvv 
slurmd: debug2: hwloc_topology_init
slurmd: debug2: hwloc_topology_load
slurmd: debug:  CPUs:2 Boards:1 Sockets:1 CoresPerSocket:2 ThreadsPerCore:1 ==
slurmd: Node configuration differs from hardware: CPUs=1:2(hw) Boards=1:1(hw) 
SocketsPerBoard=1:1(hw) CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw) ==
slurmd: topology NONE plugin loaded
slurmd: CPU frequency setting not configured for this node
slurmd: task NONE plugin loaded
slurmd: auth plugin for Munge (http://code.google.com/p/munge/) loaded
slurmd: debug:  spank: opening plugin stack 
/opt/slurm-14.03.0/etc/plugstack.conf
slurmd: Munge cryptographic signature plugin loaded
slurmd: Warning: Core limit is only 0 KB
slurmd: slurmd version 14.03.0 started
slurmd: Job accounting gather NOT_INVOKED plugin loaded
slurmd: debug:  job_container none plugin loaded
slurmd: switch NONE plugin loaded
slurmd: slurmd started on Sun, 06 Apr 2014 09:26:11 +0800
slurmd: CPUs=1 Boards=1 Sockets=1 Cores=1 Threads=1 Memory=1463 TmpDisk=14649 
Uptime=25884 == 
slurmd: AcctGatherEnergy NONE plugin loaded
slurmd: AcctGatherProfile NONE plugin loaded
slurmd: AcctGatherInfiniband NONE plugin loaded
slurmd: AcctGatherFilesystem NONE plugin loaded
slurmd: debug2: No acct_gather.conf file 
(/opt/slurm-14.03.0/etc/acct_gather.conf)

Do I have to manually configure them for each compute node?

Thanks!=

Re: Hadoop LR comparison

2014-04-01 Thread Tsai Li Ming

Thanks.

What will be equivalent code in Hadoop where Spark published the 110s/0.9s 
comparison?


On 1 Apr, 2014, at 2:44 pm, DB Tsai dbt...@alpinenow.com wrote:

 Hi Li-Ming,
 
 This binary logistic regression using SGD is in 
 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 
 We're working on multinomial logistic regression using Newton and L-BFGS 
 optimizer now. Will be released soon.
 
 
 Sincerely,
 
 DB Tsai
 Machine Learning Engineer
 Alpine Data Labs
 --
 Web: http://alpinenow.com/
 
 
 On Mon, Mar 31, 2014 at 11:38 PM, Tsai Li Ming mailingl...@ltsai.com wrote:
 Hi,
 
 Is the code available for Hadoop to calculate the Logistic Regression 
 hyperplane?
 
 I’m looking at the Examples:
 http://spark.apache.org/examples.html,
 
 where there is the 110s vs 0.9s in Hadoop vs Spark comparison.
 
 Thanks!

Re: Configuring shuffle write directory

2014-03-28 Thread Tsai Li Ming


Hi,

Thanks! I found out that I wasn’t setting the SPARK_JAVA_OPTS correctly..

I took a look at the process table and saw that the 
“org.apache.spark.executor.CoarseGrainedExecutorBackend” didn’t have the 
-Dspark.local.dir set.




On 28 Mar, 2014, at 1:05 pm, Matei Zaharia matei.zaha...@gmail.com wrote:

 I see, are you sure that was in spark-env.sh instead of 
 spark-env.sh.template? You need to copy it to just a .sh file. Also make sure 
 the file is executable.
 
 Try doing println(sc.getConf.toDebugString) in your driver program and seeing 
 what properties it prints. As far as I can tell, spark.local.dir should *not* 
 be set there, so workers should get it from their spark-env.sh. It’s true 
 that if you set spark.local.dir in the driver it would pass that on to the 
 workers for that job.
 
 Matei
 
 On Mar 27, 2014, at 9:57 PM, Tsai Li Ming mailingl...@ltsai.com wrote:
 
 Yes, I have tried that by adding it to the Worker. I can see the 
 app-20140328124540-000” in the local spark directory of the worker.
 
 But the “spark-local” directories are always written to /tmp since is the 
 default spark.local.dir is taken from java.io.tempdir?
 
 
 
 On 28 Mar, 2014, at 12:42 pm, Matei Zaharia matei.zaha...@gmail.com wrote:
 
 Yes, the problem is that the driver program is overriding it. Have you set 
 it manually in the driver? Or how did you try setting it in workers? You 
 should set it by adding
 
 export SPARK_JAVA_OPTS=“-Dspark.local.dir=whatever”
 
 to conf/spark-env.sh on those workers.
 
 Matei
 
 On Mar 27, 2014, at 9:04 PM, Tsai Li Ming mailingl...@ltsai.com wrote:
 
 Anyone can help?
 
 How can I configure a different spark.local.dir for each executor?
 
 
 On 23 Mar, 2014, at 12:11 am, Tsai Li Ming mailingl...@ltsai.com wrote:
 
 Hi,
 
 Each of my worker node has its own unique spark.local.dir.
 
 However, when I run spark-shell, the shuffle writes are always written to 
 /tmp despite being set when the worker node is started.
 
 By specifying the spark.local.dir for the driver program, it seems to 
 override the executor? Is there a way to properly define it in the worker 
 node?
 
 Thanks!

Re: Configuring shuffle write directory

2014-03-27 Thread Tsai Li Ming

Anyone can help?

How can I configure a different spark.local.dir for each executor?


On 23 Mar, 2014, at 12:11 am, Tsai Li Ming mailingl...@ltsai.com wrote:

 Hi,
 
 Each of my worker node has its own unique spark.local.dir.
 
 However, when I run spark-shell, the shuffle writes are always written to 
 /tmp despite being set when the worker node is started.
 
 By specifying the spark.local.dir for the driver program, it seems to 
 override the executor? Is there a way to properly define it in the worker 
 node?
 
 Thanks!

Setting SPARK_MEM higher than available memory in driver

2014-03-27 Thread Tsai Li Ming

Hi,

My worker nodes have more memory than the host that I’m submitting my driver 
program, but it seems that SPARK_MEM is also setting the Xmx of the spark shell?

$ SPARK_MEM=100g MASTER=spark://XXX:7077 bin/spark-shell

Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x7f736e13, 205634994176, 0) failed; error='Cannot 
allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 205634994176 bytes for 
committing reserved memory.

I want to allocate at least 100GB of memory per executor. The allocated memory 
on the executor seems to depend on the -Xmx heap size of the driver?

Thanks!

Re: Kmeans example reduceByKey slow

2014-03-24 Thread Tsai Li Ming

Hi,

This is on a 4 nodes cluster each with 32 cores/256GB Ram. 

(0.9.0) is deployed in a stand alone mode.

Each worker is configured with 192GB. Spark executor memory is also 192GB. 

This is on the first iteration. K=50. Here’s the code I use:
http://pastebin.com/2yXL3y8i , which is a copy-and-paste of the example.

Thanks!



On 24 Mar, 2014, at 2:46 pm, Xiangrui Meng men...@gmail.com wrote:

 Hi Tsai,
 
 Could you share more information about the machine you used and the
 training parameters (runs, k, and iterations)? It can help solve your
 issues. Thanks!
 
 Best,
 Xiangrui
 
 On Sun, Mar 23, 2014 at 3:15 AM, Tsai Li Ming mailingl...@ltsai.com wrote:
 Hi,
 
 At the reduceBuyKey stage, it takes a few minutes before the tasks start 
 working.
 
 I have -Dspark.default.parallelism=127 cores (n-1).
 
 CPU/Network/IO is idling across all nodes when this is happening.
 
 And there is nothing particular on the master log file. From the spark-shell:
 
 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:124 as TID 538 on 
 executor 2: XXX (PROCESS_LOCAL)
 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:124 as 38765155 
 bytes in 193 ms
 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:125 as TID 539 on 
 executor 1: XXX (PROCESS_LOCAL)
 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:125 as 38765155 
 bytes in 96 ms
 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:126 as TID 540 on 
 executor 0: XXX (PROCESS_LOCAL)
 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:126 as 38765155 
 bytes in 100 ms
 
 But it stops there for some significant time before any movement.
 
 In the stage detail of the UI, I can see that there are 127 tasks running 
 but the duration each is at least a few minutes.
 
 I'm working off local storage (not hdfs) and the kmeans data is about 6.5GB 
 (50M rows).
 
 Is this a normal behaviour?
 
 Thanks!

Re: Kmeans example reduceByKey slow

2014-03-24 Thread Tsai Li Ming

Thanks, Let me try with a smaller K.

Does the size of the input data matters for the example? Currently I have 50M 
rows. What is a reasonable size to demonstrate the capability of Spark?





On 24 Mar, 2014, at 3:38 pm, Xiangrui Meng men...@gmail.com wrote:

 K = 50 is certainly a large number for k-means. If there is no
 particular reason to have 50 clusters, could you try to reduce it
 to, e.g, 100 or 1000? Also, the example code is not for large-scale
 problems. You should use the KMeans algorithm in mllib clustering for
 your problem.
 
 -Xiangrui
 
 On Sun, Mar 23, 2014 at 11:53 PM, Tsai Li Ming mailingl...@ltsai.com wrote:
 Hi,
 
 This is on a 4 nodes cluster each with 32 cores/256GB Ram.
 
 (0.9.0) is deployed in a stand alone mode.
 
 Each worker is configured with 192GB. Spark executor memory is also 192GB.
 
 This is on the first iteration. K=50. Here's the code I use:
 http://pastebin.com/2yXL3y8i , which is a copy-and-paste of the example.
 
 Thanks!
 
 
 
 On 24 Mar, 2014, at 2:46 pm, Xiangrui Meng men...@gmail.com wrote:
 
 Hi Tsai,
 
 Could you share more information about the machine you used and the
 training parameters (runs, k, and iterations)? It can help solve your
 issues. Thanks!
 
 Best,
 Xiangrui
 
 On Sun, Mar 23, 2014 at 3:15 AM, Tsai Li Ming mailingl...@ltsai.com wrote:
 Hi,
 
 At the reduceBuyKey stage, it takes a few minutes before the tasks start 
 working.
 
 I have -Dspark.default.parallelism=127 cores (n-1).
 
 CPU/Network/IO is idling across all nodes when this is happening.
 
 And there is nothing particular on the master log file. From the 
 spark-shell:
 
 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:124 as TID 538 on 
 executor 2: XXX (PROCESS_LOCAL)
 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:124 as 38765155 
 bytes in 193 ms
 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:125 as TID 539 on 
 executor 1: XXX (PROCESS_LOCAL)
 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:125 as 38765155 
 bytes in 96 ms
 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:126 as TID 540 on 
 executor 0: XXX (PROCESS_LOCAL)
 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:126 as 38765155 
 bytes in 100 ms
 
 But it stops there for some significant time before any movement.
 
 In the stage detail of the UI, I can see that there are 127 tasks running 
 but the duration each is at least a few minutes.
 
 I'm working off local storage (not hdfs) and the kmeans data is about 
 6.5GB (50M rows).
 
 Is this a normal behaviour?
 
 Thanks!

[R] Running Rmpi/OpenMPI issues

2014-03-22 Thread Tsai Li Ming

Hi,

I have R 3.0.3 and OpenMPI 1.6.5.

Here’s my test script:
library(snow)

nbNodes - 4
cl - makeCluster(nbNodes, MPI)
clusterCall(cl, function() Sys.info()[c(nodename,machine)])
mpi.quit()

And the mpirun command:
/opt/openmpi-1.6.5-intel/bin/mpirun -np 1 -H host1,host2,host3,host4 R 
--no-save  ~/test_mpi.R

Here’s the output:
 cl - makeCluster(nbNodes, MPI)
Loading required package: Rmpi
4 slaves are spawned successfully. 0 failed.
 clusterCall(cl, function() Sys.info()[c(nodename,machine)])
[[1]]
nodename  machine 
“host1 x86_64 

[[2]]
nodename  machine 
“host1 x86_64 

[[3]]
nodename  machine 
“host1 x86_64 

[[4]]
nodename  machine 
“host1 x86_64 

 
 mpi.quit()

I followed the instructions from:
http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf
, specifically to use -np 1

1. Why is it not running on the rest of the nodes? I can see all 4 processes on 
host1 and no orted daemon running.

What should be the correct way to run this? 

2. mpi.quit() just hangs there.

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Configuring shuffle write directory

2014-03-22 Thread Tsai Li Ming

Hi,

Each of my worker node has its own unique spark.local.dir.

However, when I run spark-shell, the shuffle writes are always written to /tmp 
despite being set when the worker node is started.

By specifying the spark.local.dir for the driver program, it seems to override 
the executor? Is there a way to properly define it in the worker node?

Thanks!

Spark temp dir (spark.local.dir)

2014-03-13 Thread Tsai Li Ming

Hi,

I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).

What's the difference?

I have set -Dspark.local.dir for all my worker nodes but I'm still seeing 
directories being created in /tmp when the job is running.

I have also tried setting -Dspark.local.dir when I run the application.

Thanks!

Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Tsai Li Ming

 spark.local.dir can and should be set both on the executors and on the 
 driver (if the driver broadcast variables, the files will be stored in this 
 directory)
Do you mean the worker nodes?

Don’t think they are jetty connectors and the directories are empty:
/tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/jars
/tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/files

I run the application like this, even with the java.io.tmpdir :
bin/run-example -Dspark.executor.memory=14g -Dspark.local.dir=/mnt/storage1/lm 
-Djava.io.tmpdir=/mnt/storage1/lm org.apache.spark.examples.SparkLR 
spark://oct1:7077 10




On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel guillaume.pi...@exensa.com wrote:

 Also, I think the jetty connector will create a small file or directory in 
 /tmp regardless of the spark.local.dir 
 
 It's very small, about 10KB
 
 Guillaume
 I'm not 100% sure but I think it goes like this : 
 
 spark.local.dir can and should be set both on the executors and on the 
 driver (if the driver broadcast variables, the files will be stored in this 
 directory)
 
 the SPARK_WORKER_DIR is where the jars and the log output of the executors 
 is placed (default $SPARK_HOME/work/) and it should be cleaned regularly 
 
 In $SPARK_HOME/logs are found the logs of the workers and master
 
 Guillaume
 Hi,
 
 I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).
 
 What's the difference?
 
 I have set -Dspark.local.dir for all my worker nodes but I'm still seeing 
 directories being created in /tmp when the job is running.
 
 I have also tried setting -Dspark.local.dir when I run the application.
 
 Thanks!
 
 
 
 -- 
 Mail Attachment.png
 Guillaume PITEL, Président 
 +33(0)6 25 48 86 80
 
 eXenSa S.A.S. 
 41, rue Périer - 92120 Montrouge - FRANCE 
 Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
 
 
 -- 
 exensa_logo_mail.png
 Guillaume PITEL, Président 
 +33(0)6 25 48 86 80
 
 eXenSa S.A.S. 
 41, rue Périer - 92120 Montrouge - FRANCE 
 Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05

data locality in logs

2014-02-05 Thread Tsai Li Ming

Hi,

In older posts on Google Groups, there was mention of checking the logs on 
“preferred/non-preferred” for data locality.

But I can’t seem to find this on 0.9.0 anymore? Has this been changed to 
“PROCESS_LOCAL” , like this:
14/02/06 13:51:45 INFO TaskSetManager: Starting task 9.0:50 as TID 568 on 
executor 0: xxx (PROCESS_LOCAL)

What is the difference between process-local and node-local?

Thanks,
Liming

ClassNotFoundException: PRCombiner

2014-02-03 Thread Tsai Li Ming

Hi,

While running the Bagel’s Wikipedia Page Rank example 
(org.apache.spark.examples.bagel.WikipediaPageRank), it is having this error at 
the end:
org.apache.spark.SparkException: Job aborted: Task 3.0:4 failed 4 times (most 
recent failure: Exception failure: java.lang.ClassNotFoundException: 
org.apache.spark.examples.bagel.PRCombiner)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/02/04 09:56:34 INFO TaskSetManager: Starting task 4.0:2 as TID 170 on 
executor 0: oct1 (NODE_LOCAL)
14/02/04 09:56:34 INFO TaskSetManager: Serialized task 4.0:2 as 1754 bytes in 0 
ms
14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING 
from TID 147 because its task set is gone
14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING 
from TID 148 because its task set is gone
14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state FAILED 
from TID 147 because its task set is gone
14/02/04 09:56:34 INFO TaskSetManager: Starting task 4.0:3 as TID 171 on 
executor 2: oct2 (NODE_LOCAL)
14/02/04 09:56:34 INFO TaskSetManager: Serialized task 4.0:3 as 1754 bytes in 0 
ms
14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING 
from TID 149 because its task set is gone
14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING 
from TID 150 because its task set is gone
14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state FAILED 
from TID 148 because its task set is gone

I’m using 0.9.0.

Thanks!

Re: ClassNotFoundException: PRCombiner

2014-02-03 Thread Tsai Li Ming


On 4 Feb, 2014, at 10:08 am, Tsai Li Ming mailingl...@ltsai.com wrote:

 Hi,
 
 While running the Bagel’s Wikipedia Page Rank example 
 (org.apache.spark.examples.bagel.WikipediaPageRank), it is having this error 
 at the end:
 org.apache.spark.SparkException: Job aborted: Task 3.0:4 failed 4 times (most 
 recent failure: Exception failure: java.lang.ClassNotFoundException: 
 org.apache.spark.examples.bagel.PRCombiner)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 14/02/04 09:56:34 INFO TaskSetManager: Starting task 4.0:2 as TID 170 on 
 executor 0: oct1 (NODE_LOCAL)
 14/02/04 09:56:34 INFO TaskSetManager: Serialized task 4.0:2 as 1754 bytes in 
 0 ms
 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING 
 from TID 147 because its task set is gone
 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING 
 from TID 148 because its task set is gone
 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state FAILED 
 from TID 147 because its task set is gone
 14/02/04 09:56:34 INFO TaskSetManager: Starting task 4.0:3 as TID 171 on 
 executor 2: oct2 (NODE_LOCAL)
 14/02/04 09:56:34 INFO TaskSetManager: Serialized task 4.0:3 as 1754 bytes in 
 0 ms
 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING 
 from TID 149 because its task set is gone
 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state RUNNING 
 from TID 150 because its task set is gone
 14/02/04 09:56:34 INFO TaskSchedulerImpl: Ignoring update with state FAILED 
 from TID 148 because its task set is gone
 
 I’m using 0.9.0.
 
 Thanks!
 

The stdout is available here: http://pastebin.com/T1wYB9mh

There are a few more exceptions during the run:
java.lang.ClassNotFoundException: 
org.apache.spark.examples.bagel.WikipediaPageRank$$anonfun$1

Regards,
Liming

Re: [CentOS] Cloud Computing

2009-07-20 Thread Tsai Li Ming

Bogdan Nicolescu wrote:

 - Original Message 
 From: Tsai Li Ming lt...@osgdc.org
 To: CentOS mailing list centos@centos.org
 Sent: Monday, July 20, 2009 12:18:26 AM
 Subject: Re: [CentOS] Cloud Computing

 Hi,

 Bogdan Nicolescu wrote:

 - Original Message 
 From: Ryan J M 
 To: CentOS mailing list 
 Sent: Saturday, July 18, 2009 8:59:02 AM
 Subject: Re: [CentOS] Cloud Computing

 On Sat, Jul 18, 2009 at 4:36 AM, Mattwrote:
 Is anyone creating a cloud based on Centos yet?

 Ubuntu seems to be quite active there:

 http://www.ubuntu.com/products/whatisubuntu/serveredition/cloud/uec

 So there still has no CENTOS HPC solution provided yet, has upstreamer
 disclosed the source?

 ftp://ftp.redhat.com/pub/redhat/linux/beta/RHHPC still not accessable.

 If the centos community wish to obtain the srpms directly from us, I can 
 provide them as rhhpc srpms are obtained from us.

 As expressed before to the community here and to KB, we are willing to 
 help build and contribute to the CentOS HPC SIG.

 -Liming
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos

 Yes, please do provide the srpms

 thanks

 bn

Also, we are going to start preparing ours to work with RHEL 5.4 when it 
is out in the coming months. Can the community wait till our 5.4 
compatible version is ready. This may coincide with the Centos 5.4 release.

-Liming
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] Cloud Computing

2009-07-20 Thread Tsai Li Ming



Karanbir Singh wrote:
 Tsai Li Ming wrote:
 Also, we are going to start preparing ours to work with RHEL 5.4 when it 
 is out in the coming months. Can the community wait till our 5.4 
 compatible version is ready. This may coincide with the Centos 5.4 release.
 
 The last time we had this conversation there was an issue with 'your 
 srpms' are really not the 'red hat' srpms. Has this situation changed ?
 

KB,

Our srpms[1] are given to Red Hat and thus are being rebuild by them. 
EPEL srpms are not given because RH takes them directly from their own 
epel builds.

Till date, RH has not released the srpms. Community request is certainly 
helpful here.

If you download the srpm from rhn and compare against ours, it is not 
the same. The md5sum will not be the same because the srpms are 
generated by their build system using ours. Each srpm has a redhat 
buildhost, signed by them, etc. However, the content is the same.

If it's a centos policy to strictly use rh srpms, then we would be 
better off asking RH to release them to the community. Kusu/PCM is GPL v2.

-Liming
[1] PCM RHHPC edition srpms, since PCM has various editions.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] Cloud Computing

2009-07-19 Thread Tsai Li Ming

Hi,



Bogdan Nicolescu wrote:
 
 
 
 
 - Original Message 
 From: Ryan J M sync@gmail.com
 To: CentOS mailing list centos@centos.org
 Sent: Saturday, July 18, 2009 8:59:02 AM
 Subject: Re: [CentOS] Cloud Computing

 On Sat, Jul 18, 2009 at 4:36 AM, Mattwrote:
 Is anyone creating a cloud based on Centos yet?

 Ubuntu seems to be quite active there:

 http://www.ubuntu.com/products/whatisubuntu/serveredition/cloud/uec


 So there still has no CENTOS HPC solution provided yet, has upstreamer
 disclosed the source?

 ftp://ftp.redhat.com/pub/redhat/linux/beta/RHHPC still not accessable.



snip

If the centos community wish to obtain the srpms directly from us, I can 
provide them as rhhpc srpms are obtained from us.

As expressed before to the community here and to KB, we are willing to 
help build and contribute to the CentOS HPC SIG.

-Liming
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] Set hostname via DHCP ?

2009-06-29 Thread Tsai Li Ming



Niki Kovacs wrote:
 Niki Kovacs a écrit :
 If I take a look at /var/lib/dhclient/dhclient-eth0.leases (on the 
 client), here's a summary of the lease:

 lease {
interface eth0;
fixed-address 192.168.1.2;
option subnet-mask 255.255.255.0;
option routers 192.168.1.254;
option dhcp-lease-time 86400;
option dhcp-message-type 5;
option domain-name-servers 62.4.16.70,62.4.17.69;
option dhcp-server-identifier 192.168.1.252;
option broadcast-address 192.168.1.255;
option host-name raymonde;
option domain-name local;
renew 1 2009/6/29 17:04:30;
rebind 2 2009/6/30 04:47:44;
expire 2 2009/6/30 07:47:44;
 }

 Here's what 'hostname' returns:

 # hostname
 raymonde

 But when I go for the domain name, I get this:

 # hostname -d
 hostname: Hôte inconnu -- means 'Unknown host' in french :o)

 Any idea what I'm doing wrong here?
 
 OK, I think I got my mistake. When I specify a domain name (with 'option 
 domain-name'), this is in fact what gets written to the client's 
 /etc/resolv.conf in the 'search' line. But to handle the fully qualified 
 domain name centrally, I have to use DNS.
 
 Correct me if I'm wrong.
 

In our environment, our dhcp client machines (rhel or centos) gets their 
hostname by dns lookups (Not sure why).

Our dhcp server generally just provides the usual configurations like 
dns/search optionse/etc. The ip addresses for our dhcp range have both 
forward and reverse mappings on our dns server and the clients obtain 
the proper fully qualified host name upon dhcp request.


HTH,
Liming


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] filesystem rpm fails when /home is NFS mounted

2009-04-02 Thread Tsai Li Ming



Scott Silva wrote:
 on 4-2-2009 2:00 PM Anne Wilson spake the following:
 On Thursday 02 April 2009 21:40:59 R P Herrold wrote:
 On Wed, 1 Apr 2009, Paul Heinlein wrote:
 I don't know if it's a bug or a feature, but the
 filesystem-2.4.0-2.el5.centos rpm won't upgrade cleanly if /home is an
 NFS filesystem.
 I confirm this is present in 5.3 where /home is an NFS mount,
 and that I missed it in testing.  A workaround is:

 1. Boot into single user node.
 2. run: /sbin/service network start
 3. run: yum -y update filesystem

 If your system emitted the warning, but did not 'bail', it is
 safe to retieve the rpm locally, and to run:

 # rpm -Uvh filesystem*rpm --force

 as there are no scripts in play:

 [herr...@centos-5 ~]$ sudo rpm -q --scripts filesystem
 [herr...@centos-5 ~]$

 The cause is the NFS root_squash being in effect when a NFS
 overmount is on a mountpoint, it seems.  /home happens to
 express it

 It seems Paul and I are the last two users of NFS mounted
 /home left.

 I have /home exported and ran the upgrade from this laptop over the network, 
 where that directory is mounted and displayed in a folderview under KDE4.  I 
 had no problems whatsoever.  Is this the sort of situation you mean?

 Anne
 
 The way I read it was their /home was mounted on NFS, not just exported.
 
 

I had a problem with /mnt or /media too with a mounted ISO. Had to 
umount the ISO before filesystem rpm can be updated. This happened when 
I yum update to RHEL 5.3 recently.


-Liming
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] filesystem rpm fails when /home is NFS mounted

2009-04-02 Thread Tsai Li Ming



R P Herrold wrote:

 Thank you for the confirmation,  I have not had a chance to 
 file in the centos tracker yet, and hope to get it filed 
 tomorrow's business hours.  Similarly I have not checked 
 upstream's tracker yet.  If needee, I'll file there as well, 
 but I cannot imagine it will be deemed so critical as to pull 
 a fast fix.

Found this on upstream bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=483071

-Liming
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [Kusu-users] repopatch

2008-04-10 Thread Tsai Li Ming



Hi J,

Which kernel updates are you only interested in? There are entries in 
the database that are related to the updated kernels.


fyi, kernel-xen is not used right now.

-Liming



Jay wrote:

How do I select only some of the kernel updates found when running repopatch?  
Or do I have to disect the update kit after the fact?
   
  Thanks,

  J

 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___

Kusu-users mailing list
Kusu-users@osgdc.org
http://mail.osgdc.org/mailman/listinfo/kusu-users

___
Kusu-users mailing list
Kusu-users@osgdc.org
http://mail.osgdc.org/mailman/listinfo/kusu-users

[Mailman-Users] Getting notified for subscription bounce

2006-11-11 Thread Tsai Li Ming

Hi,

Is it possible for the list owner to get a bounce when the confirmation
emails does not get sent out to the subscribers, usually by a bounce or 550.

I have the following logs in my postfix but the owner is not getting any
bounces:

Nov 11 13:43:46 mail postfix/local[19345]: 313C22FED3:
to=[EMAIL PROTECTED],
orig_to=[EMAIL PROTECTED],
  relay=local, delay=0, status=sent (delivered to command:
/usr/lib/mailman/mail/mailman bounces listname)
Nov 11 13:43:46 mail postfix/qmgr[19329]: 313C22FED3: removed

I have the following settings in my bounce processing section:
bounce processing: Yes
bounce_score_threshold: 5.0
bounce_info_stale_after: 7
bounce_you_are_disabled_warnings: 3
bounce_you_are_disabled_warnings_interval: 7
bounce_unrecognized_goes_to_list_owner: Yes
bounce_notify_owner_on_disable: Yes
bounce_notify_owner_on_removal: Yes



Sending to [EMAIL PROTECTED] does generate an Uncaught bounce
notification


Regards,
Liming

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp

dns lookup for reverse proxy

2004-09-22 Thread Tsai Li Ming

Dear all,
I have the following directives in my conf file.
ifmodule mod_proxy.c
proxyrequests off
RewriteEngine On
ProxyPass /Server/  http://localhost:8081
ProxyPassReverse  /Server/  http://localhost:8081
RewriteRule  ^/Server$  /Server/  [P]
/IfModule
My error log:
[Wed Sep 22 02:59:41 2004] [error] [client 192.168.1.22] proxy: DNS 
lookup failure for: example2 returned by /Server

Does a force proxy requires a dns lookup on the httpd server itself? 
When I change from a [P] to a [R], it works but our client does not 
understand a http 302 (axis client).

-Liming

Re: dns lookup for reverse proxy

2004-09-22 Thread Tsai Li Ming

It didn't work, because it's not under the DocumentRoot.
[Wed Sep 22 03:30:14 2004] [error] [client 192.168.1.22] File does not 
exist: /var/www/html/Server

-Liming
Ian Holsman wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
try something like
ifmodule mod_proxy.c
ProxyPass /Server/ http://localhost:8081/
ProxyPassReverse /Server/ http://localhost:8081/
/ifmodule
you shouldn't need the other stuff to make it work
On 22/09/2004, at 1:07 PM, Tsai Li Ming wrote:
Dear all,
I have the following directives in my conf file.
ifmodule mod_proxy.c
proxyrequests off
 RewriteEngine On
 

 ProxyPass /Server/  http://localhost:8081
 ProxyPassReverse  /Server/  http://localhost:8081
 RewriteRule  ^/Server$  /Server/  [P]
/IfModule

My error log:
[Wed Sep 22 02:59:41 2004] [error] [client 192.168.1.22] proxy: DNS
lookup failure for: example2 returned by /Server
Does a force proxy requires a dns lookup on the httpd server itself?
When I change from a [P] to a [R], it works but our client does not
understand a http 302 (axis client).
-Liming
- --
Ian Holsman
Director
Network Management Systems
CNET Networks
PH: 415-344-2608 (USA) /(++61) 3-9818-0132 (Australia)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (Darwin)
iD8DBQFBUPElq3pgvCz4ZCcRAqP8AJ9bm2/Lvdqbg3Y+vEOIsXl+hvC/ngCfSM8e
HDDFrCA18WEQXIdwWWOj4AA=
=Qf5L
-END PGP SIGNATURE-

[Samba] cp input/output error

2004-09-19 Thread Tsai Li Ming

Hi
I have been getting random input/oput error when trying to cp a ISO 
(100mb) to a samba mount point. I get the same random error when I try 
to cp a txt file over too.

cp: writing `/public/cd.iso': Input/output error
my fstab:
//fserv/public  /public smbfs fmask=666,username=,password= 1
Thanks,
Liming
--
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba

45 matches

Mail list logo