>
> If you ask about trapping the SIGKILL signal in your script, see the
> following:
>
> http://linuxcommand.org/wss0160.php
>
> Cheers
>
> On Fri, Nov 20, 2015 at 10:02 PM, Vikram Kone
> wrote:
>
>> I tried adding shutdown hook to my code but it didn't help. Still same
For graphx, you should be able to read and write data from practically any
datastore Spark supports - flat files, rdbms, hadoop etc. If you want to
save your graph as it is, check something like Neo4j.
http://neo4j.com/developer/apache-spark/
Best Regards,
Sonal
Founder, Nube Technologies
I have noticed that the UI takes some time to reflect the requested changes. Is
that the issue ? Have you tried waiting for a few minutes after killing the
spark job from terminal ?
Regards,
Sudhanshu
Kind Regards,
Sudhanshu
On 23 Nov 2015, at 1:43 a.m., Ted Yu wrote:
Hi Soni,
I think you need to start the JavaStreamingContext. Add something like this
at the end of your program :
jssc.start();
jssc.awaitTermination(6);
jssc.stop();
- Yogesh
On Thu, Nov 19, 2015 at 12:34 PM, Soni spark
wrote:
> Dear Friends,
>
> I am
Vipul,
Not sure if I understand your question. DataFrame is immutable. You can't
update a DataFrame.
Could you paste some log info for the OOM error?
-Original Message-
From: vipulrai [mailto:vipulrai8...@gmail.com]
Sent: Friday, November 20, 2015 12:11 PM
To: user@spark.apache.org
There is a way. Please see the scala docs.
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions
The first version of updateStateByKey has the parameter "initialRDD"
On Fri, Nov 20, 2015 at 6:52 PM, Bryan wrote:
It seems like the problem is related to —executor-cores. Is there possibly some
sort of race condition when using multiple cores per executor?
On Nov 22, 2015, at 12:38 PM, Jeremy Davis
> wrote:
Hello,
I’m at a loss trying to diagnose why
Thanks Michael
Thanks for the response. Here is my understanding, correct me if I am wrong
1) Spark SQL written partitioned tables do not write metadata to the Hive
metastore. Spark SQL discovers partitions from the table location on the
underlying DFS, and not the metastore. It does this the
I am currently using updateStateByKey (which as you pointed out allows the
introduction of an initial RDD) to introduce an initial RDD to my window
counting function. I was hoping to essentially seed the widow state in startup
without the use of updateStateByKey to avoid the associated cost.
>> and then use the Hive's dynamic partitioned insert syntax
What does this entail? Same sql but you need to do
set hive.exec.dynamic.partition = true;
in the hive/sql context (along with several other related dynamic
partition settings.)
Is there anything else/special
Currently the width, if truncation is performed, is hardcoded to be
20 characters.
I wonder if capability for user to specify the width should be added.
If so, I can send a PR.
Cheers
On Sun, Nov 22, 2015 at 1:39 AM, Jagrut Sharma
wrote:
> Since version 1.5.0,
The Data Mining Group (http://dmg.org/) that created PMML are working on a
new standard called PFA that indeed uses JSON documents, see
http://dmg.org/pfa/docs/motivation/ for details.
PFA could be the answer to your option c.
Regards,
Vincenzo
On Wed, Nov 18, 2015 at 12:03 PM, Nick Pentreath
And does groupByKey will keep all values of pairrdd in an iterable list in
inmemory of reducer. Which will lead to outofmemory if values of a key are
beyond memory of that node .
1.Is there a way to spill that to disk ?
2.If not is there a feasibility of partitioning pairdd using custom
Since version 1.5.0, show(false) on a DataFrame prevents truncation of long
strings in the output. By default, strings more than 20 characters are
truncated.
Example usage:
scala> df.show(false)
--
Jagrut
On Sat, Nov 21, 2015 at 6:24 AM, Fengdong Yu
wrote:
> Hi,
>
>
Hi Nick
I started this thread. IMHO we need something like spark to train our
models. The resulting model are typically small enough to easily fit on a
single machine. My real time production system is not built on spark. The
real time system needs to use the model to make predictions in real
Hello,
I’m at a loss trying to diagnose why my spark job is failing. (works fine on
small data)
It is failing during the repartition, or on the subsequent steps.. which then
seem to fail and fall back to repartitioning..
I’ve tried adjusting every parameter I can find, but have had no success.
If you ask about trapping the SIGKILL signal in your script, see the
following:
http://linuxcommand.org/wss0160.php
Cheers
On Fri, Nov 20, 2015 at 10:02 PM, Vikram Kone wrote:
> I tried adding shutdown hook to my code but it didn't help. Still same
> issue
>
>
> On Fri,
17 matches
Mail list logo