Hello,
Can someone, please, share the opinions on the options available for
running spark streaming jobs on yarn? The first thing comes to my mind is
to use slider. Googling for such experience didn't give me much. From my
experience running the same jobs on mesos, I have two concerns: automatic
Hello all,
In the Mesos related spark docs (
http://spark.apache.org/docs/1.6.0/running-on-mesos.html#cluster-mode) I
found this statement:
Note that jars or python files that are passed to spark-submit should be
> URIs reachable by Mesos slaves, as the Spark driver doesn’t automatically
>
Hi Vinay,
I believe it's not possible as the spark-shuffle code should run in the
same JVM process as the Node Manager. I haven't heard anything about on the
fly bytecode loading in the Node Manger.
Thanks, Alex.
On Wed, Mar 16, 2016 at 10:12 AM, Vinay Kashyap wrote:
> Hi
Hi Angel,
Your x() functions returns an Any type, thus there is no Ordering[Any]
defined in the scope and it doesn't make sense to define one. Basically
it's the same as to order java Objects, which don't have any fields. So the
problem is with your x() function, make sure it returns something
<moshir.mik...@gmail.com>
wrote:
> Hi Alex,
> thanks for the link. Will check it.
> Does someone know of a more streamlined approach ?
>
>
>
>
> Le lun. 29 févr. 2016 à 10:28, Alex Dzhagriev <dzh...@gmail.com> a écrit :
>
>> Hi Moshir,
>>
>
Hi Moshir,
I think you can use the rest api provided with Spark:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala
Unfortunately, I haven't find any documentation, but it looks fine.
Thanks, Alex.
On Sun, Feb 28, 2016 at 3:25
there is a section that is connected to your question
>
> On 23 February 2016 at 16:49, Alex Dzhagriev <dzh...@gmail.com> wrote:
>
>> Hello all,
>>
>> Can someone please advise me on the pros and cons on how to allocate the
>> resources: many small heap machin
Hello all,
Can someone please advise me on the pros and cons on how to allocate the
resources: many small heap machines with 1 core or few machines with big
heaps and many cores? I'm sure that depends on the data flow and there is
no best practise solution. E.g. with bigger heap I can perform
Hi Saif,
You can put your files into one directory and read it as text. Another
option is to read them separately and then union the datasets.
Thanks, Alex.
On Mon, Feb 22, 2016 at 4:25 PM, wrote:
> Hello all, I am facing a silly data question.
>
> If I have +100
Hello all,
I'm using spark 1.6 and trying to cache a dataset which is 1.5 TB, I have
only ~800GB RAM in total, so I am choosing the DISK_ONLY storage level.
Unfortunately, I'm getting out of the overhead memory limit:
Container killed by YARN for exceeding memory limits. 27.0 GB of 27 GB
that one column is missing
>
> *scala> ttt.first*
> *res81: Invoice = Invoice(360,10/02/2014,"?2,500.00",?0.00)*
>
> it seems that I am missing the last column here!
>
> I suspect the cause of the problem is the "," used in "?2,500.00" which is
Hi Mich,
You can use data frames (
http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes)
to achieve that.
val sqlContext = new HiveContext(sc)
var rdd = sc.textFile("/data/stg/table2")
//...
//perform you business logic, cleanups, etc.
//...
Hello all,
Is anybody aware of any plans to support cartesian for Datasets? Are there
any ways to work around this issue without switching to RDDs?
Thanks, Alex.
13 matches
Mail list logo