If you are getting classNotFound, then you should use --jars option (of
spark-submit) to submit those jars.
Thanks
Best Regards
On Sun, Dec 21, 2014 at 10:01 AM, Tao Lu taolu2...@gmail.com wrote:
Hi, Guys,
I have some code which runs will using Spark-Submit command.
I'm only speculating, but I wonder if it was on purpose? would people
ever build an app against the REPL?
On Sun, Dec 21, 2014 at 5:50 AM, Peng Cheng pc...@uow.edu.au wrote:
Everything else is there except spark-repl. Can someone check that out this
weekend?
--
View this message in
Hari,
Thanks for the details and sorry for the late reply. Currently Spark SQL
doesn’t enable broadcast join optimization for left outer join, thus
shuffles are required to perform this query. I made a quite artificial
test to show the physical plan of your query:
|== Physical Plan ==
Evert - Thanks for the instructions, this is generally useful in other
scenarios, but I think this isn’t what Shahab needs, because
|saveAsTable| actually saves the contents of the SchemaRDD into Hive.
Shahab - As Michael has answered in another thread, you may try
On 12/17/14 1:43 PM, Jerry Raj wrote:
Hi,
I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I
have two tables (backed by Parquet files) and I need to do a join
across them using a common field (user_id). This works fine using
standard SQL but not using the
Hi,
I am running a code which takes a network file (not HDFS) location as
input. But sc.textFile(networklocation\\README.md) can't recognize
the network location start with as a valid location, because it only
accept HDFS and local like file name format?
Anyone has idea how can I use a
Could you please file a JIRA together with the Git commit you're using?
Thanks!
On 12/18/14 2:32 AM, Hao Ren wrote:
Hi,
When running SparkSQL branch 1.2.1 on EC2 standalone cluster, the following
query does not work:
create table debug as
select v1.*
from t1 as v1 left join t2 as v2
on
Hi Schweichler,
This is an interesting and practical question. I'm not familiar with how
Tableau works, but would like to share some thoughts.
In general, big data analytics frameworks like MR and Spark tend to
perform immutable functional transformations over immutable data. Whilst
in your
Actually yes, things like interactive notebooks f.i.
On Sun Dec 21 2014 at 11:35:18 AM Sean Owen so...@cloudera.com wrote:
I'm only speculating, but I wonder if it was on purpose? would people
ever build an app against the REPL?
On Sun, Dec 21, 2014 at 5:50 AM, Peng Cheng pc...@uow.edu.au
Hi All,
When I try to load a folder into the RDDs, any way for me to find the input
file name of particular partitions? So I can track partitions from which
file.
In the hadoop, I can find this information through the code:
FileSplit fileSplit = (FileSplit) context.getInputSplit();
String
I just found a possible answer:
http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/28/spark-input-filename/
Will give a try on it. Although it is a bit troublesome, but if it works,
will give what I want.
Sorry for bother everyone here
Regards,
Shuai
On Sun, Dec 21, 2014 at 4:43
With JDBC you often need to load the class so it can register the driver at
the beginning of your program. Usually this is something like:
Class.forName(com.mysql.jdbc.Driver);
On Fri, Dec 19, 2014 at 3:47 PM, durga durgak...@gmail.com wrote:
Hi I am facing an issue with mysql jars with
Yeah..., buat apparently mapPartitionsWithInputSplit thing
is mapPartitionsWithInputSplit is tagged as DeveloperApi. Because of that,
I'm not sure that it's a good idea to use the function.
For this problem, I had to create a subclass HadoopRDD and use
mapPartitions instead.
Is there any reason
Hi all,
I've just launched a new Amazon EMR cluster and used the script at:
s3://support.elasticmapreduce/spark/install-spark
to install Spark (this script was upgraded to support 1.2).
I know there are tools to launch a Spark cluster in EC2, but I want to use
EMR.
Everything installs fine;
I've pushed out an implementation of locality sensitive hashing for spark.
LSH has a number of use cases, most prominent being if the features are not
based in Euclidean space.
Code, documentation, and small exemplar dataset is available on github:
https://github.com/mrsqueeze/spark-hash
Feel
Hi all,
I understand that parquet allows for schema versioning automatically in the
format; however, I'm not sure whether Spark supports this.
I'm saving a SchemaRDD to a parquet file, registering it as a table, then
doing an insertInto with a SchemaRDD with an extra column.
The second
Hi All,
I tried to make combined.jar in shell script . it is working when I am using
spark-shell. But for the spark-submit it is same issue.
Help is highly appreciated.
Thanks
-D
--
View this message in context:
One more question.
How would I submit additional jars to the spark-submit job. I used --jars
option, it seems it is not working as explained earlier.
Thanks for the help,
-D
--
View this message in context:
Hi All,
I am facing a strange issue sporadically. occasionally my spark job is
hungup on reading s3 files. It is not throwing exception . or making some
progress, it is just hungs up there.
Is this a known issue , Please let me know how could I solve this issue.
Thanks,
-D
--
View this
Looks interesting thanks for sharing.
Does it support cosine similarity ? I only saw jaccard mentioned from a quick
glance.
—
Sent from Mailbox
On Mon, Dec 22, 2014 at 4:12 AM, morr0723 michael.d@gmail.com wrote:
I've pushed out an implementation of locality sensitive hashing for
I’m trying to run the stateful network word count at
https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/stateful_network_wordcount.py
using the command:
./bin/spark-submit
examples/src/main/python/streaming/stateful_network_wordcount.py
localhost
I am also
Hi All,
I am facing a problem when using TTL with TorrentBroadcastFactory in
Spark-1.2.0.
My code is as follows:
val conf = new SparkConf().
setAppName(TTL_Broadcast_vars).
setMaster(local).
//set(spark.broadcast.factory,
22 matches
Mail list logo