Hi,
Quick question on data type transform when creating RDD object.
I want to create a person object with "name" and DOB(date of birth):
case class Person(name: String, DOB: java.sql.Date)
then I want to create an RDD from a text file without the header, e.g. "name"
and "DOB". I have
Hi,
Can we configure Spark to enable SSE (Server Side Encryption) for saving files
to s3?
much appreciated!
thanks
Confidentiality Notice:: This email, including attachments, may include
non-public, proprietary, confidential or legally privileged information. If
you are not an intended
Hi,
I have a question on the number of workers that Spark enable to parallelize the
loading of files using sc.textFile. When I used sc.textFile to access multiple
files in AWS S3, it seems to only enable 2 workers regardless of how many
worker nodes I have in my cluster. So how does Spark
Hi Xiangrui,
For the following problem, I found out an issue ticket you posted before
https://issues.apache.org/jira/browse/HADOOP-10614
I wonder if this has been fixed in Spark 1.5.2 which I believe so. Any
suggestion on how to fix it?
Thanks
Hao
From: Lin, Hao [mailto:hao@finra.org
Hi Robert,
I just use textFile. Here is the simple code:
val fs3File=sc.textFile("s3n://my bucket/myfolder/")
fs3File.count
do you suggest I should use sc.parallelize?
many thanks
From: Robert Collich [mailto:rcoll...@gmail.com]
Sent: Monday, February 01, 2016 6:54 PM
To: Lin,
When I tried to read multiple bz2 files from s3, I have the following warning
messages. What is the problem here?
16/02/01 22:30:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
10.162.67.248): java.lang.ArrayIndexOutOfBoundsException: -1844424343
at
Can I still use SPARK_WORKER_INSTANCES in conf/spark-env.sh? the following is
what I’ve got after trying to set this parameter and run spark-shell
SPARK_WORKER_INSTANCES was detected (set to '32').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --num-executors to
If you look at the Spark Doc, variable SPARK_WORKER_INSTANCES can still be
specified but yet the SPARK_EXECUTOR_INSTANCES
http://spark.apache.org/docs/1.5.2/spark-standalone.html
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Monday, February 01, 2016 5:45 PM
To: Lin, Hao
Cc: user
Subject
Hi,
I have problem accessing local file, with such example:
sc.textFile("file:///root/2008.csv").count()
with error: File file:/root/2008.csv does not exist.
The file clearly exists since, since if I missed type the file name to an
non-existing one, it will show:
Error: Input path does not
Here you go, thanks.
-rw-r--r-- 1 root root 658M Dec 9 2014 /root/2008.csv
From: Vijay Gharge [mailto:vijay.gha...@gmail.com]
Sent: Friday, December 11, 2015 12:31 PM
To: Lin, Hao
Cc: user@spark.apache.org
Subject: Re: how to access local file from Spark sc.textFile("file:///path
to/m
Yes to your question. I have spun up a cluster, login to the master as a root
user, run spark-shell, and reference the local file of the master machine.
From: Vijay Gharge [mailto:vijay.gha...@gmail.com]
Sent: Friday, December 11, 2015 12:50 PM
To: Lin, Hao
Cc: user@spark.apache.org
Subject: Re
To: Lin, Hao
Cc: user@spark.apache.org
Subject: Re: how to access local file from Spark sc.textFile("file:///path
to/myfile")
Hm, are you referencing a local file from your remote workers? That won't work
as the file only exists in one machine (I presume).
On Fri, Dec 11, 2015 at 5:19 PM
Hi Andy, quick question, does Spark-Notebook include its own Spark engine, or I
need to install Spark separately and point to it from Spark Notebook? thanks
From: Lin, Hao [mailto:hao@finra.org]
Sent: Tuesday, December 08, 2015 7:01 PM
To: andy petrella; Jörn Franke
Cc: user@spark.apache.org
Hi,
Anyone can recommend a great Graph visualization tool for GraphX that can
handle truly large Data (~ TB) ?
Thanks so much
Hao
Confidentiality Notice:: This email, including attachments, may include
non-public, proprietary, confidential or legally privileged information. If
you are not
specific ☺. Thanks
hao
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: Tuesday, December 08, 2015 11:31 AM
To: Lin, Hao
Cc: user@spark.apache.org
Subject: Re: Graph visualization tool for GraphX
I am not sure about your use case. How should a human interpret many terabytes
of data in one large
Thanks Andy, I certainly will give a try to your suggestion.
From: andy petrella [mailto:andy.petre...@gmail.com]
Sent: Tuesday, December 08, 2015 1:21 PM
To: Lin, Hao; Jörn Franke
Cc: user@spark.apache.org
Subject: Re: Graph visualization tool for GraphX
Hello Lin,
This is indeed a tough
Hi,
Does anyone knows if Spark run in AWS is supported by temporary access
credential (AccessKeyId, SecretAccessKey + SecurityToken) to access S3? I only
see references to specify fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey,
without any mention of security token. Apparently this is only
Thanks, I will keep an eye on it.
From: Michal Klos [mailto:michal.klo...@gmail.com]
Sent: Friday, December 04, 2015 1:50 PM
To: Lin, Hao
Cc: user
Subject: Re: Is Temporary Access Credential (AccessKeyId, SecretAccessKey +
SecurityToken) support by Spark?
We were looking into this as well
Mich, did you run this locally or on EC2 (I use EC2)? Is this problem
universal or specific to, say EC2? Many thanks
From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: Wednesday, December 02, 2015 5:01 PM
To: Lin, Hao; user@spark.apache.org
Subject: RE: starting spark-shell throws /tmp
I actually don't have the folder /tmp/hive created in my master node, is that a
problem?
From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: Wednesday, December 02, 2015 5:40 PM
To: Lin, Hao; user@spark.apache.org
Subject: RE: starting spark-shell throws /tmp/hive on HDFS should be writable
It seems that the data size is only 2.9MB, far less than the default rdd
size. How about put more data into kafka? and what about the number of
topic partitions from kafka?
Best regards,
Lin Hao XU
IBM Research China
Email: xulin...@cn.ibm.com
My Flickr: http://www.flickr.com/photos/xulinhao
For you question, I think the discussion in this link can help.
http://apache-spark-user-list.1001560.n3.nabble.com/Error-related-to-serialisation-in-spark-streaming-td6801.html
Best regards,
Lin Hao XU
IBM Research China
Email: xulin...@cn.ibm.com
My Flickr: http://www.flickr.com/photos
.
3. We also tested ListPcapNetworkInterface nifs = Pcaps.findAllDevs() in
a standard Java program, it really worked like a champion.
Best regards,
Lin Hao XU
IBM Research China
Email: xulin...@cn.ibm.com
My Flickr: http://www.flickr.com/photos/xulinhao/sets
From: Dean Wampler deanwamp
btw, from spark web ui, the acl is marked with root
Best regards,
Lin Hao XU
IBM Research China
Email: xulin...@cn.ibm.com
My Flickr: http://www.flickr.com/photos/xulinhao/sets
From: Dean Wampler deanwamp...@gmail.com
To: Lin Hao Xu/China/IBM@IBMCN
Cc: Hai Shan Wu/China/IBM@IBMCN
Actually, to simplify this problem, we run our program on a single machine
with 4 slave workers. Since on a single machine, I think all slave workers
are ran with root privilege.
BTW, if we have a cluster, how to make sure slaves on remote machines run
program as root?
Best regards,
Lin Hao XU
25 matches
Mail list logo