from:"Lin, Hao"

data type transform when creating an RDD object

2016-02-17 Thread Lin, Hao

Hi, Quick question on data type transform when creating RDD object. I want to create a person object with "name" and DOB(date of birth): case class Person(name: String, DOB: java.sql.Date) then I want to create an RDD from a text file without the header, e.g. "name" and "DOB". I have

SSE in s3

2016-02-12 Thread Lin, Hao

Hi, Can we configure Spark to enable SSE (Server Side Encryption) for saving files to s3? much appreciated! thanks Confidentiality Notice:: This email, including attachments, may include non-public, proprietary, confidential or legally privileged information. If you are not an intended

sc.textFile the number of the workers to parallelize

2016-02-04 Thread Lin, Hao

Hi, I have a question on the number of workers that Spark enable to parallelize the loading of files using sc.textFile. When I used sc.textFile to access multiple files in AWS S3, it seems to only enable 2 workers regardless of how many worker nodes I have in my cluster. So how does Spark

RE: try to read multiple bz2 files in s3

2016-02-02 Thread Lin, Hao

Hi Xiangrui, For the following problem, I found out an issue ticket you posted before https://issues.apache.org/jira/browse/HADOOP-10614 I wonder if this has been fixed in Spark 1.5.2 which I believe so. Any suggestion on how to fix it? Thanks Hao From: Lin, Hao [mailto:hao@finra.org

RE: try to read multiple bz2 files in s3

2016-02-02 Thread Lin, Hao

Hi Robert, I just use textFile. Here is the simple code: val fs3File=sc.textFile("s3n://my bucket/myfolder/") fs3File.count do you suggest I should use sc.parallelize? many thanks From: Robert Collich [mailto:rcoll...@gmail.com] Sent: Monday, February 01, 2016 6:54 PM To: Lin,

try to read multiple bz2 files in s3

2016-02-01 Thread Lin, Hao

When I tried to read multiple bz2 files from s3, I have the following warning messages. What is the problem here? 16/02/01 22:30:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.162.67.248): java.lang.ArrayIndexOutOfBoundsException: -1844424343 at

SPARK_WORKER_INSTANCES deprecated

2016-02-01 Thread Lin, Hao

Can I still use SPARK_WORKER_INSTANCES in conf/spark-env.sh? the following is what I’ve got after trying to set this parameter and run spark-shell SPARK_WORKER_INSTANCES was detected (set to '32'). This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with --num-executors to

RE: SPARK_WORKER_INSTANCES deprecated

2016-02-01 Thread Lin, Hao

If you look at the Spark Doc, variable SPARK_WORKER_INSTANCES can still be specified but yet the SPARK_EXECUTOR_INSTANCES http://spark.apache.org/docs/1.5.2/spark-standalone.html From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Monday, February 01, 2016 5:45 PM To: Lin, Hao Cc: user Subject

how to access local file from Spark sc.textFile("file:///path to/myfile")

2015-12-11 Thread Lin, Hao

Hi, I have problem accessing local file, with such example: sc.textFile("file:///root/2008.csv").count() with error: File file:/root/2008.csv does not exist. The file clearly exists since, since if I missed type the file name to an non-existing one, it will show: Error: Input path does not

RE: how to access local file from Spark sc.textFile("file:///path to/myfile")

2015-12-11 Thread Lin, Hao

Here you go, thanks. -rw-r--r-- 1 root root 658M Dec 9 2014 /root/2008.csv From: Vijay Gharge [mailto:vijay.gha...@gmail.com] Sent: Friday, December 11, 2015 12:31 PM To: Lin, Hao Cc: user@spark.apache.org Subject: Re: how to access local file from Spark sc.textFile("file:///path to/m

RE: how to access local file from Spark sc.textFile("file:///path to/myfile")

2015-12-11 Thread Lin, Hao

Yes to your question. I have spun up a cluster, login to the master as a root user, run spark-shell, and reference the local file of the master machine. From: Vijay Gharge [mailto:vijay.gha...@gmail.com] Sent: Friday, December 11, 2015 12:50 PM To: Lin, Hao Cc: user@spark.apache.org Subject: Re

RE: how to access local file from Spark sc.textFile("file:///path to/myfile")

2015-12-11 Thread Lin, Hao

To: Lin, Hao Cc: user@spark.apache.org Subject: Re: how to access local file from Spark sc.textFile("file:///path to/myfile") Hm, are you referencing a local file from your remote workers? That won't work as the file only exists in one machine (I presume). On Fri, Dec 11, 2015 at 5:19 PM

RE: Graph visualization tool for GraphX

2015-12-10 Thread Lin, Hao

Hi Andy, quick question, does Spark-Notebook include its own Spark engine, or I need to install Spark separately and point to it from Spark Notebook? thanks From: Lin, Hao [mailto:hao@finra.org] Sent: Tuesday, December 08, 2015 7:01 PM To: andy petrella; Jörn Franke Cc: user@spark.apache.org

Graph visualization tool for GraphX

2015-12-08 Thread Lin, Hao

Hi, Anyone can recommend a great Graph visualization tool for GraphX that can handle truly large Data (~ TB) ? Thanks so much Hao Confidentiality Notice:: This email, including attachments, may include non-public, proprietary, confidential or legally privileged information. If you are not

RE: Graph visualization tool for GraphX

2015-12-08 Thread Lin, Hao

specific ☺. Thanks hao From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: Tuesday, December 08, 2015 11:31 AM To: Lin, Hao Cc: user@spark.apache.org Subject: Re: Graph visualization tool for GraphX I am not sure about your use case. How should a human interpret many terabytes of data in one large

RE: Graph visualization tool for GraphX

2015-12-08 Thread Lin, Hao

Thanks Andy, I certainly will give a try to your suggestion. From: andy petrella [mailto:andy.petre...@gmail.com] Sent: Tuesday, December 08, 2015 1:21 PM To: Lin, Hao; Jörn Franke Cc: user@spark.apache.org Subject: Re: Graph visualization tool for GraphX Hello Lin, This is indeed a tough

Is Temporary Access Credential (AccessKeyId, SecretAccessKey + SecurityToken) support by Spark?

2015-12-04 Thread Lin, Hao

Hi, Does anyone knows if Spark run in AWS is supported by temporary access credential (AccessKeyId, SecretAccessKey + SecurityToken) to access S3? I only see references to specify fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey, without any mention of security token. Apparently this is only

RE: Is Temporary Access Credential (AccessKeyId, SecretAccessKey + SecurityToken) support by Spark?

2015-12-04 Thread Lin, Hao

Thanks, I will keep an eye on it. From: Michal Klos [mailto:michal.klo...@gmail.com] Sent: Friday, December 04, 2015 1:50 PM To: Lin, Hao Cc: user Subject: Re: Is Temporary Access Credential (AccessKeyId, SecretAccessKey + SecurityToken) support by Spark? We were looking into this as well

RE: starting spark-shell throws /tmp/hive on HDFS should be writable error

2015-12-02 Thread Lin, Hao

Mich, did you run this locally or on EC2 (I use EC2)? Is this problem universal or specific to, say EC2? Many thanks From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: Wednesday, December 02, 2015 5:01 PM To: Lin, Hao; user@spark.apache.org Subject: RE: starting spark-shell throws /tmp

RE: starting spark-shell throws /tmp/hive on HDFS should be writable error

2015-12-02 Thread Lin, Hao

I actually don't have the folder /tmp/hive created in my master node, is that a problem? From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: Wednesday, December 02, 2015 5:40 PM To: Lin, Hao; user@spark.apache.org Subject: RE: starting spark-shell throws /tmp/hive on HDFS should be writable

Re: The Processing loading of Spark streaming on YARN is not in balance

2015-04-30 Thread Lin Hao Xu

It seems that the data size is only 2.9MB, far less than the default rdd size. How about put more data into kafka? and what about the number of topic partitions from kafka? Best regards, Lin Hao XU IBM Research China Email: xulin...@cn.ibm.com My Flickr: http://www.flickr.com/photos/xulinhao

Re: Re: implicit function in SparkStreaming

2015-04-29 Thread Lin Hao Xu

For you question, I think the discussion in this link can help. http://apache-spark-user-list.1001560.n3.nabble.com/Error-related-to-serialisation-in-spark-streaming-td6801.html Best regards, Lin Hao XU IBM Research China Email: xulin...@cn.ibm.com My Flickr: http://www.flickr.com/photos

Re: A problem of using spark streaming to capture network packets

2015-04-28 Thread Lin Hao Xu

. 3. We also tested ListPcapNetworkInterface nifs = Pcaps.findAllDevs() in a standard Java program, it really worked like a champion. Best regards, Lin Hao XU IBM Research China Email: xulin...@cn.ibm.com My Flickr: http://www.flickr.com/photos/xulinhao/sets From: Dean Wampler deanwamp

Re: A problem of using spark streaming to capture network packets

2015-04-28 Thread Lin Hao Xu

btw, from spark web ui, the acl is marked with root Best regards, Lin Hao XU IBM Research China Email: xulin...@cn.ibm.com My Flickr: http://www.flickr.com/photos/xulinhao/sets From: Dean Wampler deanwamp...@gmail.com To: Lin Hao Xu/China/IBM@IBMCN Cc: Hai Shan Wu/China/IBM@IBMCN

Re: A problem of using spark streaming to capture network packets

2015-04-28 Thread Lin Hao Xu

Actually, to simplify this problem, we run our program on a single machine with 4 slave workers. Since on a single machine, I think all slave workers are ran with root privilege. BTW, if we have a cluster, how to make sure slaves on remote machines run program as root? Best regards, Lin Hao XU

data type transform when creating an RDD object

SSE in s3

sc.textFile the number of the workers to parallelize

RE: try to read multiple bz2 files in s3

RE: try to read multiple bz2 files in s3

try to read multiple bz2 files in s3

SPARK_WORKER_INSTANCES deprecated

RE: SPARK_WORKER_INSTANCES deprecated

how to access local file from Spark sc.textFile("file:///path to/myfile")

RE: how to access local file from Spark sc.textFile("file:///path to/myfile")

RE: how to access local file from Spark sc.textFile("file:///path to/myfile")

RE: how to access local file from Spark sc.textFile("file:///path to/myfile")

RE: Graph visualization tool for GraphX

Graph visualization tool for GraphX

RE: Graph visualization tool for GraphX

RE: Graph visualization tool for GraphX

Is Temporary Access Credential (AccessKeyId, SecretAccessKey + SecurityToken) support by Spark?

RE: Is Temporary Access Credential (AccessKeyId, SecretAccessKey + SecurityToken) support by Spark?

RE: starting spark-shell throws /tmp/hive on HDFS should be writable error

RE: starting spark-shell throws /tmp/hive on HDFS should be writable error

Re: The Processing loading of Spark streaming on YARN is not in balance

Re: Re: implicit function in SparkStreaming

Re: A problem of using spark streaming to capture network packets

Re: A problem of using spark streaming to capture network packets

Re: A problem of using spark streaming to capture network packets

25 matches

Site Navigation

Mail list logo

Footer information