I've been trying to obtain clarification on the terms of use regarding
repo.spark-packages.org. I emailed feedb...@spark-packages.org two weeks
ago, but have not heard back. Whom should I contact?
On Mon, Apr 26, 2021 at 8:13 AM Bo Zhang wrote:
> Hi Apache Spark users,
>
> As you might know,
-data-on-amazon-elastic-mapreduce/run-a-spark-job-within-amazon-emr-in-15-minutes-68b02af1ae16
EKS https://medium.com/@vikas.navlani/running-spark-on-aws-eks-1cd4c31786c
Richard
On 19/02/2024 13:36, Jagannath Majhi wrote:
Dear Spark Community,
I hope this email finds you well. I am reaching out
icitly use forward slash if path contains gs:
and the job now runs successfully.
Richard
Saw
Sent from Yahoo Mail for iPhone
On Wednesday, July 8, 2020, 9:26 PM, Sricheta Ruj
wrote:
Hello Spark Team
I am trying to use the DataSourceV2 API from Spark 3.0. I wanted to ask in case
of write- how do I get the user specified schema?
This is what I am trying to
What’s the right way use Structured Streaming with both state and windows?
Looking at the slides from
https://www.slideshare.net/databricks/arbitrary-stateful-aggregations-using-structured-streaming-in-apache-spark
slides 26 and 31, it looks like stateful processing events for every device
h.
Thanks,
Richard
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary da
l's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 19 Jul 2019 at 23:17, Richard wrote:
>
>> Thanks for the reply,
>> my situation is li
at 2:26 PM Mich Talebzadeh
wrote:
> Hi Richard,
>
> You can use the following to read JSON data into DF. The example is
> reading JSON from Kafka topic
>
> val sc = spark.sparkContext
> import spark.implicits._
> // Use map to create the new RDD
let's say I use spark to migrate some data from Cassandra table to Oracle
table
Cassandra Table:
CREATE TABLE SOURCE(
id UUID PRIMARY KEY,
col1 text,
col2 text,
jsonCol text
);
example jsonCol value: {"foo": "val1", "bar", "val2"}
I am trying to extract fields from the json column while importing
and btw, same connection string works fine when used in SQL Developer.
On Tuesday, June 18, 2019, 03:49:24 PM PDT, Richard Xin
wrote:
HI, I need help with tcps oracle connection from spark (version:
spark-2.4.0-bin-hadoop2.7)
Properties prop = new Properties();prop.putAll(sparkOracle
HI, I need help with tcps oracle connection from spark (version:
spark-2.4.0-bin-hadoop2.7)
Properties prop = new Properties();prop.putAll(sparkOracle); //
username/password
prop.put("javax.net.ssl.trustStore", "path to
root.jks");prop.put("javax.net.ssl.trustStorePassword", "password_here");
Sent from Yahoo Mail for iPhone
On Monday, May 6, 2019, 18:34, Russell Spitzer
wrote:
Scala version mismatched
Spark is shown at 2.12, the connector only has a 2.11 release
On Mon, May 6, 2019, 7:59 PM Richard Xin
wrote:
org.apache.spark
spark-core_2.12
2.4.0
compile
org.apache.spark
spark-core_2.12
2.4.0
compile
org.apache.spark
spark-sql_2.12
2.4.0
com.datastax.spark
spark-cassandra-connector_2.11
2.4.1
I run spark-submit I got following exceptions on Spark 2.4.2, it works fine
when running spark-submit under
/latest.html
Richard L. Garris
Director of Field Engineering
Databricks, Inc.
rich...@databricks.com
Mobile: 650.200.0840
databricks.com
<http://databricks.com/>
On Fri, Mar 1, 2019 at 2:21 AM Nuno Silva
wrote:
> Hi,
>
> Not sure if I'm delivering my request through the right
from the existing documentation.
Regards,
Richard
Op vr 17 aug. 2018 om 15:33 schreef Maximiliano Patricio Méndez <
mmen...@despegar.com>
> Hi,
>
> I've added table level security using spark extensions based on the
> ongoing work proposed for ranger in RANGER-2128. Following the
Hi,
I'd like to implement some kind of row-level security and am thinking of
adding additional filters to the logical plan possibly using the Spark
extensions.
Would this be feasible, for example using the injectResolutionRule?
thanks in advance,
Richard
Would you mind share your code with us to analyze?
> On Feb 10, 2018, at 10:18 AM, amit kumar singh wrote:
>
> Hi Team,
>
> We have hive external table which has 50 tb of data partitioned on year
> month day
>
> i want to move last 2 month of data into another table
>
Can find a good source for documents, but the source code
“org.apache.spark.sql.execution.streaming.ProgressReporter” is helpful to
answer some of them.
For example:
inputRowsPerSecond = numRecords / inputTimeSec,
processedRowsPerSecond = numRecords / processingTimeSec
This is explaining
In a situation where multiple workflows write different partitions of the
same table.
Example:
10 Different processes are writing parquet or orc files for different
partitions of the same table foo, at
> Do you have any opinion for the solution. I really appreciate
>
>
>
> Onur EKİNCİ
> Bilgi Yönetimi Yöneticisi
> Knowledge Management Manager
>
> m:+90 553 044 2341 d:+90 212 329 7000
>
> İTÜ Ayazağa Kampüsü, Teknokent ARI4 Binası 34469 Maslak İs
Curious you are using"jdbc:sqlserve" to connect oracle, why?
Also kindly reminder scrubbing your user id password.
Sent from my iPhone
> On Jan 16, 2018, at 03:00, Onur EKİNCİ wrote:
>
> Hi,
>
> We are trying to get data from an Oracle database into Kinetica database
Greetings,
In version 1.6.0, is it possible to write a partitioned dataframe into
parquet format using a UDF function on the partition column? I'm using
pyspark.
Let's say I have a dataframe with coumn `date`, of type string or int, which
contains values such as `20170825`. Is it possible to
I'm trying to locate four independent contractors who have experience with
Spark. I'm not sure where I can go to find experienced Spark consultants.
Please, no recruiters.
--
-Richard L. Burton III
-deep-learning/blob/f088de45daec06865ac02a9ec1323eb2c9eebb89/src/main/scala/com/databricks/sparkdl/ImageUtils.scala
You can reuse this code potentially.
Richard Garris
Principal Architect
Databricks, Inc
650.200.0840
rlgar...@databricks.com
On December 17, 2017 at 3:12:41 PM, Don Drake (dondr
storing
it as a vector or Array vs a large Java class object?
That might be the more prudent approach.
-RG
Richard Garris
Principal Architect
Databricks, Inc
650.200.0840
rlgar...@databricks.com
On December 14, 2017 at 10:23:00 AM, Marcelo Vanzin (van...@cloudera.com)
wrote:
This sounds like
Hi,
would it be possible to determine the Cook's distance using Spark?
thanks,
Richard
to RUNNING (even if 1 executor was allocated)?
“
Best Regards
Richard
On 12/7/17, 2:40 PM, "bsikander" <behro...@gmail.com> wrote:
Marcelo Vanzin wrote
> I'm not sure I follow you here. This is something that you are
> defining, not Spark.
Yes, you are right.
is submitted
to executors”. With this concept, you may define your own status.
Best Regards
Richard
On 12/4/17, 4:06 AM, "bsikander" <behro...@gmail.com> wrote:
So, I tried to use SparkAppHandle.Listener with SparkLauncher as you
suggested. The behavior of Launcher is not
Kant, right, we cannot use Driver’s producer in executor. That’s I mentioned
“kafka sink” to solve it.
This article should be helpful about it
https://allegro.tech/2015/08/spark-kafka-integration.html
Best Regards
Richard
From: kant kodali <kanth...@gmail.com>
Date: Thursday, December 7
;>("topicA", gson.toJson(map))); //
send smaller json in a task
}
}
});
When you do it, make sure kafka producer (seek kafka sink for it) and gson’s
environment setup correctly in executors.
If after this, there is still OOM, let’s discuss further.
Best Regar
Are you now building your app using spark 2.2 or 2.1?
Best Regards
Richard
From: Imran Rajjad <raj...@gmail.com>
Date: Wednesday, December 6, 2017 at 2:45 AM
To: "user @spark" <user@spark.apache.org>
Subject: unable to connect to connect to cluster 2.2.0
Hi,
Recent
In the 2nd case, is there any producer’s error thrown in executor’s log?
Best Regards
Richard
From: kant kodali <kanth...@gmail.com>
Date: Tuesday, December 5, 2017 at 4:38 PM
To: "Qiao, Richard" <richard.q...@capitalone.com>
Cc: "user @spark" <user@spark
Where do you check the output result for both case?
Sent from my iPhone
> On Dec 5, 2017, at 15:36, kant kodali wrote:
>
> Hi All,
>
> I have a simple stateless transformation using Dstreams (stuck with the old
> API for one of the Application). The pseudo code is rough
It works to collect Job level, through Jolokia java agent.
Best Regards
Richard
From: Nick Dimiduk <ndimi...@gmail.com>
Date: Monday, December 4, 2017 at 6:53 PM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: Re: Access to Applications metrics
Bump.
On Wed
Junjeng, it worth a try to start your spark local with hadoop.dll/winutils.exe
etc hadoop windows support package in HADOOP_HOME, if you didn’t do that yet.
Best Regards
Richard
From: Junfeng Chen <darou...@gmail.com>
Date: Monday, December 4, 2017 at 3:53 AM
To: "Qiao, Richard&
It seems a common mistake that the path is not accessible by workers/executors.
Best regards
Richard
Sent from my iPhone
On Dec 3, 2017, at 22:32, Junfeng Chen
<darou...@gmail.com<mailto:darou...@gmail.com>> wrote:
I am working on importing snappy compressed json file in
Sourav:
I’m using spark streaming 2.1.0 and can confirm
spark.dynamicAllocation.enabled is enough.
Best Regards
Richard
From: Sourav Mazumder <sourav.mazumde...@gmail.com>
Date: Sunday, December 3, 2017 at 12:31 PM
To: user <user@spark.apache.org>
Subject: Dyna
)
Best Regards
Richard
From: venkat <meven...@gmail.com>
Date: Thursday, November 30, 2017 at 8:16 PM
To: Cody Koeninger <c...@koeninger.org>
Cc: "user@spark.apache.org" <user@spark.apache.org>
Subject: Re: [Spark streaming] No assigned partition error during seek
I noti
logs I see many ProvisionedThroughputExceededException
however this should be benign in that the KCL should retry those records.
Unfortunately I am not seeing the missing records processed at a later date.
Where to look next?
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Richard M
so let's say I have chained path in
spark.driver.extraClassPath/spark.executor.extraClassPath such as
/path1/*:/path2/*, and I have different versions of the same jar under those 2
directories, how spark pick the version of jar to use, from /path1/*?
Thanks.
Can we add extra library (jars on S3) to spark-submit? if yes, how? such as
--jars, extraClassPath, extraLibPathThanks,Richard
I believe you could use JOLT (bazaarvoice/jolt) to flatten it to a json string
and then to dataframe or dataset.
|
|
|
| | |
|
|
|
| |
bazaarvoice/jolt
jolt - JSON to JSON transformation library written in Java.
|
|
|
On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan
ort()
.getOrCreate()
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Richard Moorhead
Software Engineer
richard.moorh...@c2fo.com<mailto:richard.moorh...@gmail.com>
C2FO: The World's Market for Working Capital®
[http://c2fo.com/wp-content/uploads/sites/1/2016/03/LinkedIN.png]
<https://www.linkedin.com/company/c2f
Set your master to local[10]; you are only allocating one core currently.
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Richard Moorhead
Software Engineer
richard.moorh...@c2fo.com<mailto:richard.moorh...@gmail.com>
C2FO: The World's Market for Working Capital®
[http://c2fo.
active and collaborative documents
with SQL ...
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Richard Moorhead
Software Engineer
richard.moorh...@c2fo.com<mailto:richard.moorh...@gmail.com>
C2FO: The World's Market for Working Capital®
[http://c2fo.com/wp-content/uploads/sit
operations?
logger.info(s"RDD LENGTH: ${events.count}")
//nullpointer exception on call to .map
val df = events.map(e => {
...
}
}
}
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Richard Moorhead
Software Engineer
richard.moorh...@c2fo.com<mailto:richard
I have a streaming job which writes data to S3. I know there are saveAs
functions helping write data to S3. But it bundles all elements then writes out
to S3. So my first question - Is there any way to let saveAs functions
write data in batch or single elements instead of whole bundle?
I'm also interested in this, does anyone this?
On 17 April 2017 at 17:17, Vishnu Viswanath
wrote:
> Hello All,
>
> Does anyone know if the skew handling code mentioned in this talk
> https://www.youtube.com/watch?v=bhYV0JOPd9Y was added to spark?
>
> If so can I
I am playing with some data using (stand alone) spark-shell (Spark version
1.6.0) by executing `spark-shell`. The flow is simple; a bit like cp -
basically moving local 100k files (the max size is 190k) to S3. Memory is
configured as below
export SPARK_DRIVER_MEMORY=8192M
export
specializations for Long keys which happen to perform
not very well on some specific distributions. Does anyone have ideas about
this?
Best wishes,
Richard
// lines of word IDs
val data = (1 to 5000).par.map({ _ =>
(1 to 1000) map { _ => (-1000 * Math.log(Random.nextDouble)).toInt }
}).seq
//
JavaRDD jsonRDD =
new
JavaSparkContext(sparkSession.sparkContext()).parallelize(results);
Dataset peopleDF = sparkSession.createDataFrame(jsonRDD,
Row.class);
Richard Xin
On Tuesday, March 28, 2017 7:51 AM, Karin Valisova <ka...@datapine.com>
wrote:
maybe Apache Ignite does fit your requirements
On 15 March 2017 at 08:44, vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:
> Hi
> If queries are statics and filters are on the same columns, Cassandra is a
> good option.
>
> Le 15 mars 2017 7:04 AM, "muthu" a écrit
I think it's difficult to determine with certainty if a variable is
continuous or categorical, what to do when the values are numbers like 1,
2, 2, 3, 4, 5. These values can both be continuous as categorical.
for exa
However you could perform some checks:
- are there any decimal values > it will
try
Row newRow = RowFactory.create(row.getString(0), row.getString(1),
row.getMap(2));
On Friday, January 27, 2017 10:52 AM, Ankur Srivastava
wrote:
+ DEV Mailing List
On Thu, Jan 26, 2017 at 5:12 PM, Ankur Srivastava
wrote:
Hi,
haven't used it, but Jackcess should do the trick >
http://jackcess.sourceforge.net/
kind regards,
Richard
2017-01-25 11:47 GMT+01:00 Selvam Raman <sel...@gmail.com>:
>
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>
I found contradictions in document 1.6.0 and 2.1.x
in
http://spark.apache.org/docs/1.6.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriterit
says: "This is only applicable for Parquet at the moment."
in
here - http://spark.apache.org/docs/latest/tuning.html
<http://spark.apache.org/docs/latest/tuning.html>Cheers,
Richard
https://richardstartin.com/
From: Enrico DUrso <enrico.du...@everis.com>
Sent: 10 January 2017 11:10
To: user@spark.apache.org
Subj
changes
of behaviour or changes in the build process or something like that,
kind regards,
Richard
On 9 January 2017 at 22:55, Richard Siebeling <rsiebel...@gmail.com> wrote:
> Hi,
>
> I'm setting up Apache Spark 2.1.0 on Mesos and I am getting a "Could not
> p
e the same
configuration but using a Spark 2.0.0 is running fine within Vagrant.
Could someone please help?
thanks in advance,
Richard
Why not do that with spark sql to utilise the executors properly, rather than a
sequential filter on the driver.
Select * from A left join B on A.fk = B.fk where B.pk is NULL limit k
If you were sorting just so you could iterate in order, this might save you a
couple of sorts too.
thanks, I have seen this, but this doesn't cover my question.
What I need is read json and include raw json as part of my dataframe.
On Friday, December 30, 2016 10:23 AM, Annabel Melongo
<melongo_anna...@yahoo.com.INVALID> wrote:
Richard,
Below documentation will show you how to
nf = new
SparkConf().setMaster("local[2]").setAppName("json_test");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
HiveContext hc = new HiveContext(ctx.sc());
DataFrame df = hc.read().json("files/json/example2.json");
what I need is a DataFrame with columns id, ln, fn, age as well as raw_json
string
any advice on the best practice in java?Thanks,
Richard
try this:JavaRDD mapr = listrdd.map(x -> broadcastVar.value().get(x));
On Wednesday, December 21, 2016 2:25 PM, Sateesh Karuturi
wrote:
I need to process spark Broadcast variables using Java RDD API. This is my
code what i have tried so far:This is only
I think limit repartitions your data into a single partition if called as a non
terminal operator. Hence zip works after limit because you only have one
partition.
In practice, I have found joins to be much more applicable than zip because of
the strict limitation of identical partitions.
I am not sure I understood your logic, but it seems to me that you could take a
look of Hive's Lead/Lag functions.
On Monday, December 19, 2016 1:41 AM, Milin korath
wrote:
thanks, I tried with left outer join. My dataset having around 400M records
and lot
rn row;
}
From: Richard Xin <richardxin...@yahoo.com>
Sent: Saturday, December 17, 2016 8:53 PM
To: Yong Zhang; zjp_j...@163.com; user
Subject: Re: Java to show struct field from a Dataframe I tried to transform
root
|-- latitude: double (nullable = false)
|-- longitude: double (null
;).schema().printTreeString();
// prints schema tree OK as expected
transformedDf.show(); // java.lang.ClassCastException: [D cannot be
cast to java.lang.Double
seems to me that the ReturnType of the UDF2 might be the root cause. but not
sure how to correct.
Thanks,Richard
On
1217234614718397 {}#yiv7434848277 body
{font-size:10.5pt;color:rgb(0, 0, 0);line-height:1.5;}I think the causation is
your invanlid Double data , have u checked your data ?
zjp_j...@163.com
From: Richard XinDate: 2016-12-17 23:28To: UserSubject: Java to show struct
field from a Dataframelet's say I
let's say I have a DataFrame with schema of followings:root
|-- name: string (nullable = true)
|-- location: struct (nullable = true)
| |-- longitude: double (nullable = true)
| |-- latitude: double (nullable = true)
df.show(); throws following exception:
java.lang.ClassCastException:
iate functionimport static
org.apache.spark.sql.functions.callUDF;import static
org.apache.spark.sql.functions.col;
udf should be callUDF e.g.ds.withColumn("localMonth", callUDF("toLocalMonth",
col("unixTs"), col("tz")))
On 17 December 2016 at 09:54, Richa
what I am trying to do:I need to add column (could be complicated
transformation based on value of a column) to a give dataframe.
scala script:val hContext = new HiveContext(sc)
import hContext.implicits._
val df = hContext.sql("select x,y,cluster_no from test.dc")
val len = udf((str: String) =>
Ok it looks like I could reconstruct the logic in the Spark UI from the /jobs
resource. Thanks.
https://richardstartin.com/
From: map reduced <k3t.gi...@gmail.com>
Sent: 07 December 2016 19:49
To: Richard Startin
Cc: user@spark.apache.org
Subject: Re:
Is there any way to get this information as CSV/JSON?
https://docs.databricks.com/_images/CompletedBatches.png
[https://docs.databricks.com/_images/CompletedBatches.png]
https://richardstartin.com/
From: Richard Startin <richardstar...@outlook.com>
Se
help react
quickly to increased/reduced capacity.
spark.streaming.backpressure.pid.minRate - the default value is 100 (must be
positive), batch size won't go below this.
spark.streaming.receiver.maxRate - batch size won't go above this.
Cheers,
Richard
https://richards
Is there any way to get a more computer friendly version of the completes
batches section of the streaming page of the application master? I am very
interested in the statistics and am currently screen-scraping...
https://richardstartin.com
There is a great write up on Livy at
http://henning.kropponline.de/2016/11/06/
On 5 Dec 2016, at 14:34, Mich Talebzadeh
> wrote:
Hi,
Has there been any experience using Livy with Spark to share multiple Spark
contexts?
thanks
Dr
Hi Frank,
Two suggestions
1. I would recommend caching the corpus prior to running LDA
2. If you are using EM I would tweak the sample size using the
setMiniBatchFraction
parameter to decrease the sample per iteration.
-Richard
On Tue, Sep 20, 2016 at 10:27 AM, Frank Zhang <
datami
it
with Memory, SSD, and/or HDDs with the DFS as the persistent store, called
under-filesystem.
Hope this helps.
Richard Catlin
> On Sep 19, 2016, at 7:56 AM, aka.fe2s <aka.f...@gmail.com> wrote:
>
> Hi folks,
>
> What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't me
> Begin forwarded message:
>
> From: "Chen, Kevin"
> Subject: Re: Missing output partition file in S3
> Date: September 19, 2016 at 10:54:44 AM PDT
> To: Steve Loughran
> Cc: "user@spark.apache.org"
>
> Hi Steve,
>
>
).
The analytic functions could help when gathering the statistics over the
whole set,
kind regards,
Richard
On Wed, Aug 24, 2016 at 10:54 PM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:
> Hi Richard,
>
> can you use analytics functions for this purpose on DF
>
> HTH
&
regards,
Richard
On Wed, Aug 24, 2016 at 6:52 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:
> Hi Richard,
>
> What is the business use case for such statistics?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
>
, is that possible?
We could sacrifice a little bit of performance (but not too much), that's
why we prefer one pass...
Is this possible in the standard Spark or would this mean modifying the
source a little bit and recompiling? Is that feasible / wise to do?
thanks in advance,
Richard
I was using the 1.1 driver. I upgraded that library to 2.1 and it resolved my
problem.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/HiveThriftServer-and-spark-sql-hive-thriftServer-singleSession-setting-tp27340p27566.html
Sent from the Apache Spark User
Im attempting to access a dataframe from jdbc:
However this temp table is not accessible from beeline when connected to
this instance of HiveServer2.
--
View this message in context:
How are you calling registerTempTable from hiveContext? It appears to be a
private method.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Table-registered-using-registerTempTable-not-found-in-HiveContext-tp26555p27514.html
Sent from the Apache Spark User
I am running HiveServer2 as well and when I connect with beeline I get the
following:
org.apache.spark.sql.internal.SessionState cannot be cast to
org.apache.spark.sql.hive.HiveSessionState
Do you know how to resolve this?
--
View this message in context:
fixed! after adding the option -DskipTests everything build ok.
Thanks Sean for your help
On Thu, Aug 4, 2016 at 8:18 PM, Richard Siebeling <rsiebel...@gmail.com>
wrote:
> I don't see any other errors, these are the last lines of the
> make-distribution log.
> Ab
016 at 6:30 PM, Sean Owen <so...@cloudera.com> wrote:
> That message is a warning, not error. It is just because you're cross
> compiling with Java 8. If something failed it was elsewhere.
>
>
> On Thu, Aug 4, 2016, 07:09 Richard Siebeling <rsiebel...@gmail.com> w
--tgz -Pyarn -Phadoop-2.7
-Dhadoop.version=2.7.0-mapr-1602
It fails with the error "bootstrap class path not set in conjunction with
-source 1.7"
Could you please help? I do not know what this error means,
thanks in advance,
Richard
Is there a proper way to make or get an Encoder for Option in Spark 2.0?
There isn't one by default and while ExpressionEncoder from catalyst will
work, it is private and unsupported.
--
*Richard Marscher*
Senior Software Engineer
Localytics
Localytics.com <http://localytics.com/> | Ou
I believe it depends on your Spark application.
To write to Hive, use
dataframe.saveAsTable
To write to S3, use
dataframe.write.parquet(“s3://”)
Hope this helps.
Richard
> On Jun 16, 2016, at 9:54 AM, Natu Lauchande <nlaucha...@gmail.com> wrote:
>
> Does
e - I do not see a
>> simple reduceByKey replacement.
>>
>> Regards,
>>
>> Bryan Jeffrey
>>
>>
>
--
*Richard Marscher*
Senior Software Engineer
Localytics
Localytics.com <http://localytics.com/> | Our Blog
<http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
Facebook <http://facebook.com/localytics> | LinkedIn
<http://www.linkedin.com/company/1148792?trk=tyah>
wrote:
> That kind of stuff is likely fixed in 2.0. If you can get a reproduction
> working there it would be very helpful if you could open a JIRA.
>
> On Mon, Jun 6, 2016 at 7:37 AM, Richard Marscher <rmarsc...@localytics.com
> > wrote:
>
>> A quick unit test
ault.
>
> That said, I would like to enable that kind of sugar while still taking
> advantage of all the optimizations going on under the covers. Can you get
> it to work if you use `as[...]` instead of `map`?
>
> On Wed, Jun 1, 2016 at 11:59 AM, Richard Marscher &l
1, 2016 at 1:42 PM, Michael Armbrust <mich...@databricks.com>
wrote:
> Thanks for the feedback. I think this will address at least some of the
> problems you are describing: https://github.com/apache/spark/pull/13425
>
> On Wed, Jun 1, 2016 at 9:58 AM, Richard Marscher <rma
to -1 instead of null. Now it's
completely ambiguous what data in the join was actually there versus
populated via this atypical semantic.
Are there additional options available to work around this issue? I can
convert to RDD and back to Dataset but that's less than ideal.
Thanks,
--
*Richard
Well the task itself is completed (it indeed gives a result) but the tasks
in Mesos says killed and it gives an error as Remote RPC client
disassociated. Likely due to containers exceeding thresholds, or network
issues.
Kind regards,
Richard
Op maandag 16 mei 2016 heeft Jacek Laskowski <
B.t.w. this is on a single node cluster
Op zondag 15 mei 2016 heeft Richard Siebeling <rsiebel...@gmail.com> het
volgende geschreven:
> Hi,
>
> I'm getting the following errors running SparkPi on a clean just compiled
> and checked Mesos 0.29.0 installation with Spark 1.6.1
>
.
Please help,
thanks in advance,
Richard
The complete logs are
sudo ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
mesos://192.168.33.10:5050 --deploy-mode client ./lib/spark-examples* 10
16/05/15 23:05:36 WARN NativeCodeLoader: Unable to load native-hadoop
library for your
1 - 100 of 221 matches
Mail list logo