[jira] [Comment Edited] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-05-03 Thread Tsai Li Ming (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675 ] Tsai Li Ming edited comment on SPARK-15039 at 5/3/16 1:16 PM: -- [~zsxwing

[jira] [Commented] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-05-03 Thread Tsai Li Ming (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675 ] Tsai Li Ming commented on SPARK-15039: -- [~zsxwing] Nothing suspcisious in the logs. The streaming

[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-04-30 Thread Tsai Li Ming (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsai Li Ming updated SPARK-15039: - Description: Hi, Using the pyspark kinesis example, it does not receive any messages from

[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-04-30 Thread Tsai Li Ming (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsai Li Ming updated SPARK-15039: - Description: Hi, Using the pyspark kinesis example, it does not receive any messages from

[jira] [Created] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-04-30 Thread Tsai Li Ming (JIRA)
Tsai Li Ming created SPARK-15039: Summary: Kinesis reciever does not work in Yarn Key: SPARK-15039 URL: https://issues.apache.org/jira/browse/SPARK-15039 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel

2016-02-10 Thread Tsai Li Ming (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623 ] Tsai Li Ming commented on SPARK-3220: - I built Derrick's kmeans against Spark 1.6.0 and ran {code

[jira] [Comment Edited] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel

2016-02-10 Thread Tsai Li Ming (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623 ] Tsai Li Ming edited comment on SPARK-3220 at 2/10/16 11:01 AM: --- I built

[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel

2016-02-09 Thread Tsai Li Ming (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140434#comment-15140434 ] Tsai Li Ming commented on SPARK-3220: - [~derrickburns], Is your private fork at https://github.com

The usage of OpenBLAS

2015-06-26 Thread Tsai Li Ming
Hi, I found out that the instructions for OpenBLAS has been changed by the author of netlib-java in: https://github.com/apache/spark/pull/4448 since Spark 1.3.0 In that PR, I asked whether there’s still a need to compile OpenBLAS with USE_THREAD=0, and also about Intel MKL. Is it still

Issues building 1.4.0 using make-distribution

2015-06-17 Thread Tsai Li Ming
Hi, I downloaded the source from Downloads page and ran the make-distribution.sh script. # ./make-distribution.sh --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package The script has “-x” set in the beginning. ++ /tmp/a/spark-1.4.0/build/mvn help:evaluate

Documentation for external shuffle service in 1.4.0

2015-06-17 Thread Tsai Li Ming
Hi, I can’t seem to find any documentation on this feature in 1.4.0? Regards, Liming - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Not getting event logs = spark 1.3.1

2015-06-16 Thread Tsai Li Ming
Forgot to mention this is on standalone mode. Is my configuration wrong? Thanks, Liming On 15 Jun, 2015, at 11:26 pm, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, I have this in my spark-defaults.conf (same for hdfs): spark.eventLog.enabled true spark.eventLog.dir

Not getting event logs = spark 1.3.1

2015-06-15 Thread Tsai Li Ming
Hi, I have this in my spark-defaults.conf (same for hdfs): spark.eventLog.enabled true spark.eventLog.dir file:/tmp/spark-events spark.history.fs.logDirectory file:/tmp/spark-events While the app is running, there is a “.inprogress” directory. However when the job

Re: Logstash as a source?

2015-02-01 Thread Tsai Li Ming
I have been using a logstash alternative - fluentd to ingest the data into hdfs. I had to configure fluentd to not append the data so that spark streaming will be able to pick up the new logs. -Liming On 2 Feb, 2015, at 6:05 am, NORD SC jan.algermis...@nordsc.com wrote: Hi, I plan to

Re: Confused why I'm losing workers/executors when writing a large file to S3

2015-01-21 Thread Tsai Li Ming
I’m getting the same issue on Spark 1.2.0. Despite having set “spark.core.connection.ack.wait.timeout” in spark-defaults.conf and verified in the job UI (port 4040) environment tab, I still get the “no heartbeat in 60 seconds” error. spark.core.connection.ack.wait.timeout=3600 15/01/22

Understanding stages in WebUI

2014-11-25 Thread Tsai Li Ming
Hi, I have the classic word count example: file.flatMap(line = line.split( )).map(word = (word,1)).reduceByKey(_ + _).collect() From the Job UI, I can only see 2 stages: 0-collect and 1-map. What happened to ShuffledRDD in reduceByKey? And both flatMap and map operations is collapsed into a

RDD memory and storage level option

2014-11-20 Thread Tsai Li Ming
Hi, This is on version 1.1.0. I’m did a simple test on MEMORY_AND_DISK storage level. var file = sc.textFile(“file:///path/to/file.txt”).persit(StorageLevel.MEMORY_AND_DISK) file.count() The file is 1.5GB and there is only 1 worker. I have requested for 1GB of worker memory per node:

Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-09-12 Thread Tsai Li Ming
Another observation I had was reading over local filesystem with “file://“. it was stated as PROCESS_LOCAL which was confusing. Regards, Liming On 13 Sep, 2014, at 3:12 am, Nicholas Chammas nicholas.cham...@gmail.com wrote: Andrew, This email was pretty helpful. I feel like this stuff

[slurm-dev] Cyclic distribution problem

2014-06-30 Thread Tsai Li Ming
Hi, I’m running 2 slurmds on a single host (built with --enable-multiple-slurmd). The total cpus are divided equally among the 2 nodes. I’m trying to test the distribution modes=block/cyclic but the tasks are always allocated on the first node unless I use --ntasks-per-node=1 $ srun -n2

[slurm-dev] Reserved Partition name?

2014-04-05 Thread Tsai Li Ming
Hi, I am using the following partition name DEFAULT/default but slurmctld is not able to start. NodeName=compute State=UNKNOWN PartitionName=default Nodes=compute Default=YES MaxTime=INFINITE State=UP slurmctld: debug: Reading slurm.conf file: /opt/slurm-14.03.0/etc/slurm.conf slurmctld:

[slurm-dev] Mismatch in cpu configuration

2014-04-05 Thread Tsai Li Ming
Hi, I’m testing slurm on my vm. My compute node is defined in slurmd.conf without any CPU/Socket/Core/Thread information: NodeName=compute State=UNKNOWN # ./slurmd -C ClusterName=(null) NodeName=compute CPUs=2 Boards=1 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1463

Re: Hadoop LR comparison

2014-04-01 Thread Tsai Li Ming
://alpinenow.com/ On Mon, Mar 31, 2014 at 11:38 PM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, Is the code available for Hadoop to calculate the Logistic Regression hyperplane? I’m looking at the Examples: http://spark.apache.org/examples.html, where there is the 110s vs 0.9s

Re: Configuring shuffle write directory

2014-03-28 Thread Tsai Li Ming
tell, spark.local.dir should *not* be set there, so workers should get it from their spark-env.sh. It’s true that if you set spark.local.dir in the driver it would pass that on to the workers for that job. Matei On Mar 27, 2014, at 9:57 PM, Tsai Li Ming mailingl...@ltsai.com wrote

Re: Configuring shuffle write directory

2014-03-27 Thread Tsai Li Ming
Anyone can help? How can I configure a different spark.local.dir for each executor? On 23 Mar, 2014, at 12:11 am, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, Each of my worker node has its own unique spark.local.dir. However, when I run spark-shell, the shuffle writes are always

Setting SPARK_MEM higher than available memory in driver

2014-03-27 Thread Tsai Li Ming
Hi, My worker nodes have more memory than the host that I’m submitting my driver program, but it seems that SPARK_MEM is also setting the Xmx of the spark shell? $ SPARK_MEM=100g MASTER=spark://XXX:7077 bin/spark-shell Java HotSpot(TM) 64-Bit Server VM warning: INFO:

Re: Kmeans example reduceByKey slow

2014-03-24 Thread Tsai Li Ming
On Sun, Mar 23, 2014 at 3:15 AM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, At the reduceBuyKey stage, it takes a few minutes before the tasks start working. I have -Dspark.default.parallelism=127 cores (n-1). CPU/Network/IO is idling across all nodes when this is happening

Re: Kmeans example reduceByKey slow

2014-03-24 Thread Tsai Li Ming
:53 PM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, This is on a 4 nodes cluster each with 32 cores/256GB Ram. (0.9.0) is deployed in a stand alone mode. Each worker is configured with 192GB. Spark executor memory is also 192GB. This is on the first iteration. K=50. Here's

[R] Running Rmpi/OpenMPI issues

2014-03-22 Thread Tsai Li Ming
Hi, I have R 3.0.3 and OpenMPI 1.6.5. Here’s my test script: library(snow) nbNodes - 4 cl - makeCluster(nbNodes, MPI) clusterCall(cl, function() Sys.info()[c(nodename,machine)]) mpi.quit() And the mpirun command: /opt/openmpi-1.6.5-intel/bin/mpirun -np 1 -H host1,host2,host3,host4 R --no-save

Configuring shuffle write directory

2014-03-22 Thread Tsai Li Ming
Hi, Each of my worker node has its own unique spark.local.dir. However, when I run spark-shell, the shuffle writes are always written to /tmp despite being set when the worker node is started. By specifying the spark.local.dir for the driver program, it seems to override the executor? Is

Spark temp dir (spark.local.dir)

2014-03-13 Thread Tsai Li Ming
Hi, I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir). What's the difference? I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running. I have also tried setting -Dspark.local.dir when I run the

Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Tsai Li Ming
spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory) Do you mean the worker nodes? Don’t think they are jetty connectors and the directories are empty:

data locality in logs

2014-02-05 Thread Tsai Li Ming
Hi, In older posts on Google Groups, there was mention of checking the logs on “preferred/non-preferred” for data locality. But I can’t seem to find this on 0.9.0 anymore? Has this been changed to “PROCESS_LOCAL” , like this: 14/02/06 13:51:45 INFO TaskSetManager: Starting task 9.0:50 as TID

ClassNotFoundException: PRCombiner

2014-02-03 Thread Tsai Li Ming
Hi, While running the Bagel’s Wikipedia Page Rank example (org.apache.spark.examples.bagel.WikipediaPageRank), it is having this error at the end: org.apache.spark.SparkException: Job aborted: Task 3.0:4 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException:

Re: ClassNotFoundException: PRCombiner

2014-02-03 Thread Tsai Li Ming
On 4 Feb, 2014, at 10:08 am, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, While running the Bagel’s Wikipedia Page Rank example (org.apache.spark.examples.bagel.WikipediaPageRank), it is having this error at the end: org.apache.spark.SparkException: Job aborted: Task 3.0:4 failed 4

Re: [CentOS] Cloud Computing

2009-07-20 Thread Tsai Li Ming
Bogdan Nicolescu wrote: - Original Message From: Tsai Li Ming lt...@osgdc.org To: CentOS mailing list centos@centos.org Sent: Monday, July 20, 2009 12:18:26 AM Subject: Re: [CentOS] Cloud Computing Hi, Bogdan Nicolescu wrote: - Original Message From

Re: [CentOS] Cloud Computing

2009-07-20 Thread Tsai Li Ming
Karanbir Singh wrote: Tsai Li Ming wrote: Also, we are going to start preparing ours to work with RHEL 5.4 when it is out in the coming months. Can the community wait till our 5.4 compatible version is ready. This may coincide with the Centos 5.4 release. The last time we had

Re: [CentOS] Cloud Computing

2009-07-19 Thread Tsai Li Ming
Hi, Bogdan Nicolescu wrote: - Original Message From: Ryan J M sync@gmail.com To: CentOS mailing list centos@centos.org Sent: Saturday, July 18, 2009 8:59:02 AM Subject: Re: [CentOS] Cloud Computing On Sat, Jul 18, 2009 at 4:36 AM, Mattwrote: Is anyone creating a

Re: [CentOS] Set hostname via DHCP ?

2009-06-29 Thread Tsai Li Ming
Niki Kovacs wrote: Niki Kovacs a écrit : If I take a look at /var/lib/dhclient/dhclient-eth0.leases (on the client), here's a summary of the lease: lease { interface eth0; fixed-address 192.168.1.2; option subnet-mask 255.255.255.0; option routers 192.168.1.254; option

Re: [CentOS] filesystem rpm fails when /home is NFS mounted

2009-04-02 Thread Tsai Li Ming
Scott Silva wrote: on 4-2-2009 2:00 PM Anne Wilson spake the following: On Thursday 02 April 2009 21:40:59 R P Herrold wrote: On Wed, 1 Apr 2009, Paul Heinlein wrote: I don't know if it's a bug or a feature, but the filesystem-2.4.0-2.el5.centos rpm won't upgrade cleanly if /home is an NFS

Re: [CentOS] filesystem rpm fails when /home is NFS mounted

2009-04-02 Thread Tsai Li Ming
R P Herrold wrote: Thank you for the confirmation, I have not had a chance to file in the centos tracker yet, and hope to get it filed tomorrow's business hours. Similarly I have not checked upstream's tracker yet. If needee, I'll file there as well, but I cannot imagine it will be

Re: [Kusu-users] repopatch

2008-04-10 Thread Tsai Li Ming
Hi J, Which kernel updates are you only interested in? There are entries in the database that are related to the updated kernels. fyi, kernel-xen is not used right now. -Liming Jay wrote: How do I select only some of the kernel updates found when running repopatch? Or do I have to

[Mailman-Users] Getting notified for subscription bounce

2006-11-11 Thread Tsai Li Ming
Hi, Is it possible for the list owner to get a bounce when the confirmation emails does not get sent out to the subscribers, usually by a bounce or 550. I have the following logs in my postfix but the owner is not getting any bounces: Nov 11 13:43:46 mail postfix/local[19345]: 313C22FED3:

dns lookup for reverse proxy

2004-09-22 Thread Tsai Li Ming
Dear all, I have the following directives in my conf file. ifmodule mod_proxy.c proxyrequests off RewriteEngine On ProxyPass /Server/ http://localhost:8081 ProxyPassReverse /Server/ http://localhost:8081 RewriteRule ^/Server$

Re: dns lookup for reverse proxy

2004-09-22 Thread Tsai Li Ming
://localhost:8081/ ProxyPassReverse /Server/ http://localhost:8081/ /ifmodule you shouldn't need the other stuff to make it work On 22/09/2004, at 1:07 PM, Tsai Li Ming wrote: Dear all, I have the following directives in my conf file. ifmodule mod_proxy.c proxyrequests off RewriteEngine

[Samba] cp input/output error

2004-09-19 Thread Tsai Li Ming
Hi I have been getting random input/oput error when trying to cp a ISO (100mb) to a samba mount point. I get the same random error when I try to cp a txt file over too. cp: writing `/public/cd.iso': Input/output error my fstab: //fserv/public /public smbfs fmask=666,username=,password=