Or Databaricks Delta (announced at Spark Summit) or IBM Event Store depending
on the use case.
On Oct 31, 2017, at 14:30, Joseph Pride
> wrote:
Folks:
SnappyData.
I’m fairly new to working with it myself, but it looks pretty
Pros:
No need for Scala skills, Java can be used.
Other companies are already doing it.
> Support Yarn execution
But not only…
Complex use-case for import can easily be done in Java (see
https://spark-summit.org/eu-2017/events/extending-apache-sparks-ingestion-building-your-own-java-data-source/
I have seen a similar scenario where we load data from a RDBMS into a NoSQL
database… Spark made sense for velocity and parallel processing (and cost of
licenses :) ).
> On Oct 15, 2017, at 21:29, Saravanan Thirumalai
> wrote:
>
> We are an Investment firm
SK,
Have you considered:
Dataset df = spark.read().json(dfWithStringRowsContainingJson);
jg
> On Oct 11, 2017, at 16:35, sk skk wrote:
>
> Can we create a dataframe from a Java pair rdd of String . I don’t have a
> schema as it will be a dynamic Json. I gave
Something along the line of:
Dataset df = spark.read().json(jsonDf); ?
From: kant kodali [mailto:kanth...@gmail.com]
Sent: Saturday, October 07, 2017 2:31 AM
To: user @spark
Subject: How to convert Array of Json rows into Dataset of specific columns in
Spark 2.2.0?
I
do you have a little more to share with us?
maybe you can set another TEMP directory. are you getting a result?
From: usa usa [mailto:usact2...@gmail.com]
Sent: Tuesday, October 03, 2017 10:50 AM
To: user@spark.apache.org
Subject: Spark 2.2.0 Win 7 64 bits Exception while deleting Spark temp dir
Sorry Steve - I may not have been very clear: thinking about
aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled with
Spark.
From: Steve Loughran [mailto:ste...@hortonworks.com]
Sent: Tuesday, October 03, 2017 2:20 PM
To: JG Perrin <jper...@lumeris.com>
Cc
Thanks Yash… this is helpful!
From: Yash Sharma [mailto:yash...@gmail.com]
Sent: Tuesday, October 03, 2017 1:02 AM
To: JG Perrin <jper...@lumeris.com>; user@spark.apache.org
Subject: Re: Quick one... AWS SDK version?
Hi JG,
Here are my cluster configs if it helps.
Cheers.
EMR: emr
Hey Sparkians,
What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the
Hadoop 2.7.3 libs?
Thanks!
jg
[mailto:ste...@hortonworks.com]
Sent: Saturday, September 30, 2017 6:10 AM
To: JG Perrin <jper...@lumeris.com>
Cc: Alexander Czech <alexander.cz...@googlemail.com>; user@spark.apache.org
Subject: Re: HDFS or NFS as a cache?
On 29 Sep 2017, at 20:03, JG Perrin
<jper...@lumeris.
@Anastasios: just a word of caution, this is Spark 1.x CSV parser, there a few
(minor) changes for Spark 2.x, you can have a look at
http://jgp.net/2017/10/01/loading-csv-in-spark/.
From: Anastasios Zouzias [mailto:zouz...@gmail.com]
Sent: Sunday, October 01, 2017 2:05 AM
To: Kanagha Kumar
You will collect in the driver (often the master) and it will save the data, so
for saving, you will not have to set up HDFS.
From: Alexander Czech [mailto:alexander.cz...@googlemail.com]
Sent: Friday, September 29, 2017 8:15 AM
To: user@spark.apache.org
Subject: HDFS or NFS as a cache?
I have
On a test system, you can also use something like Owncloud/Nextcloud/Dropbox to
insure that the files are synchronized. Would not do it for TB of data ;) ...
-Original Message-
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: Friday, September 29, 2017 5:14 AM
To: Gaurav1809
Maybe load the model on each executor’s disk and load it from there? Depending
on how you use the data/model, using something like Livy and sharing the same
connection may help?
From: Naveen Swamy [mailto:mnnav...@gmail.com]
Sent: Wednesday, September 27, 2017 9:08 PM
To: user@spark.apache.org
As the others have mentioned, your loading time might kill your benchmark… I am
in a similar process right now, but I time each operation, load, process 1,
process 2, etc. not always easy with lazy operators, but you can try to force
operations with false collect and cache (for benchmarking
not using Yarn, just standalone cluster with 2 nodes here (physical, not even
VM). network seems good between the nodes .
From: ayan guha [mailto:guha.a...@gmail.com]
Sent: Tuesday, September 26, 2017 10:39 AM
To: JG Perrin <jper...@lumeris.com>
Cc: user@spark.apache.org
Subject: Re: Deb
Hi,
I get the infamous:
Initial job has not accepted any resources; check your cluster UI to ensure
that workers are registered and have sufficient resources
I run the app via Eclipse, connecting:
SparkSession spark = SparkSession.builder()
.appName("Converter -
Hi,
I have different files being dumped on S3, I want to ingest them and join them.
What does sound better to you? Have one " directory" for all or one per file
format?
If I have one directory for all, can you get some metadata about the file, like
its name?
If multiple directory, how can I
Are you assuming that all partitions are of equal size? Did you try with more
partitions (like repartitioning)? Does the error always happen with the last
(or smaller) file? If you are sending to redshift, why not use the JDBC driver?
-Original Message-
From: abbim
Have you tried the built-in parser, not the databricks one (which is not really
used anymore)?
What is your original CSV looking like?
What is your code looking like? There are quite a few options to read a CSV…
From: Aakash Basu [mailto:aakash.spark@gmail.com]
Sent: Sunday, September 03,
apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
From: JG Perrin [mailto:jper...@
Thanks Sam – this might be the solution. I will investigate!
From: Sam Elamin [mailto:hussam.ela...@gmail.com]
Sent: Monday, August 28, 2017 1:14 PM
To: JG Perrin <jper...@lumeris.com>
Cc: user@spark.apache.org
Subject: Re: from_json()
Hi jg,
Perhaps I am misunderstanding you, but if yo
Is there a way to not have to specify a schema when using from_json() or infer
the schema? When you read a JSON doc from disk, you can infer the schema.
Should I write it to disk before (ouch)?
jg
__
This electronic
Hey Mike,
You need to do it yourself, it’s really easy:
http://spark.apache.org/community.html.
hih
jg
From: Michael Artz [mailto:michaelea...@gmail.com]
Sent: Monday, August 28, 2017 7:43 AM
To: user@spark.apache.org
Subject: add me to email list
Hi,
Please add me to the email list
Mike
Thanks Michael – this is a great article… very helpful
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Wednesday, August 23, 2017 4:33 PM
To: JG Perrin <jper...@lumeris.com>
Cc: user@spark.apache.org
Subject: Re: Joining 2 dataframes, getting result as nested list/str
Hi folks,
I am trying to join 2 dataframes, but I would like to have the result as a list
of rows of the right dataframe (dDf in the example) in a column of the left
dataframe (cDf in the example). I made it work with one column, but having
issues adding more columns/creating a row(?).
Seq
26 matches
Mail list logo