Thanks much Gerard & Manas for your inputs. I'll keep in mind the connection
pooling part.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Create-one-DB-connection-per-executor-tp26588p26601.html
Sent from the Apache Spark User List mailing list arc
I understand that using foreachPartition I can create one DB connection per
partition level. Is there a way to create a DB connection per executor level
and share that for all partitions/tasks run within that executor? One
approach I am thinking is to have a singleton with say a getConnection
will be less)
I hope to get some guidance as to what parameter I can use in order to
totally avoid this issue.
I am guessing spark.shuffle.io.preferDirectBufs = false but I am not sure.
..Manas
On Tue, Mar 15, 2016 at 2:30 PM, Iain Cundy <iain.cu...@amdocs.com> wrote:
> Hi Manas
>
I am using spark 1.6.
I am not using any broadcast variable.
This broadcast variable is probably used by the state management of
mapwithState
...Manas
On Tue, Mar 15, 2016 at 10:40 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Which version of Spark are you using ?
>
> Can you show t
[Int] from a row instead of Int from a
dataframe?
...Manas
Some more description
/*My case class*/
case class Student(name: String, age: Option[Int])
val s = new Student("Manas",Some(35))
val s1 = new Student("Manas1",None)
val student =sc.makeRDD(List(s,s1)).toDF
/*Now w
[AvroKeyInputFormat[myObject]],
classOf[AvroKey[myObject]],
classOf[NullWritable])
Basically I would like to end up having a tuple of (FileName,
AvroKey[MyObject, NullWritable])
Any help is appreciated.
.Manas
Trying to bump up the rank of the question.
Any example on Github can someone point to?
..Manas
On Fri, Apr 3, 2015 at 9:39 AM, manasdebashiskar manasdebashis...@gmail.com
wrote:
Hi experts,
I am trying to write unit tests for my spark application which fails
Source)
Thanks
Manas
/content/cloudera/en/downloads.html) super easy.
Currently there are on Spark 1.2.
..Manas
On Mon, Mar 30, 2015 at 1:34 PM, vance46 wang2...@purdue.edu wrote:
Hi all,
I'm a newbee try to setup spark for my research project on a RedHat system.
I've downloaded spark-1.3.0.tgz and untared
= org.scalanlp %% breeze-natives % V.breeze
val config = com.typesafe % config % V.config
}
There are only few more things to try(like reverting back to Spark 1.1)
before I run out of idea completely.
Please share your insights.
..Manas
On Wed, Mar 11, 2015 at 9:44 AM, Sean Owen
.
If you want to ask questions like nearby me then these are the basic steps.
1) Index your geometry data which uses R-Tree.
2) Write your joiner logic that takes advantage of the index tree to get
you faster access.
Thanks
Manas
On Wed, Mar 11, 2015 at 5:55 AM, Andrew Musselman
andrew.mussel
(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks
Manas
The above is a great example using thread.
Does any one have an example using scala/Akka Future to do the same.
I am looking for an example like that which uses a Akka Future and does
something if the Future Timesout
On Tue, Mar 3, 2015 at 7:00 AM, Kartheek.R kartheek.m...@gmail.com wrote:
Hi
The above is a great example using thread.
Does any one have an example using scala/Akka Future to do the same.
I am looking for an example like that which uses a Akka Future and does
something if the Future Timesout
On Tue, Mar 3, 2015 at 9:16 AM, Manas Kar manasdebashis...@gmail.com
wrote
is executed does not help as
the actual message gets buried in logs.
How do one go about debugging such case?
Also, is there a way I can wrap my function inside some sort of timer based
environment and if it took too long I would throw a stack trace or some
sort.
Thanks
Manas
Hi experts,
I am using Spark 1.2 from CDH5.3.
When I issue commands like
myRDD.take(10) the result gets truncated after 4-5 records.
Is there a way to configure the same to show more items?
..Manas
)
at
org.apache.spark.deploy.master.Master.finishApplication(Master.scala:653)
at
org.apache.spark.deploy.master.Master$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$29.apply(Master.scala:399)
Can anyone help?
..Manas
this behaviour.
I can see my output correctly in HDFS and all.
I will give it one more try after increasing master's memory(which is
default 296MB to 512 MB)
..manas
On Thu, Feb 12, 2015 at 2:14 PM, Arush Kharbanda ar...@sigmoidanalytics.com
wrote:
How many nodes do you have in your cluster, how
I have 5 workers each executor-memory 8GB of memory. My driver memory is 8
GB as well. They are all 8 core machines.
To answer Imran's question my configurations are thus.
executor_total_max_heapsize = 18GB
This problem happens at the end of my program.
I don't have to run a lot of jobs to see
Hi Experts,
I have recently installed HDP2.2(Depends on hadoop 2.6).
My spark 1.2 is built with hadoop 2.4 profile.
My program has following dependencies
val avro= org.apache.avro % avro-mapred %1.7.7
val spark = org.apache.spark % spark-core_2.10 % 1.2.0 %
provided
My
Hi,
I have a spark cluster that has 5 machines with 32 GB memory each and 2
machines with 24 GB each.
I believe the spark.executor.memory will assign the executor memory for all
executors.
How can I use 32 GB memory from the first 5 machines and 24 GB from the
next 2 machines.
Thanks
..Manas
every time because of the volume of
data.
...Manas
*For some reason I have never got any reply to my emails to the user group.
I am hoping to break that trend this time. :)*
Hi,
I am using a library that parses Ais Messages. My code which follows the
simple steps gives me null values in Date field.
1) Get the message from file.
2) parse the message.
3) map the message RDD to only keep the (Date, SomeInfo)
4) take top 100 element.
Result = the Date field appears fine
, SecurityManager}
build.scala
http://apache-spark-user-list.1001560.n3.nabble.com/file/n7796/build.scala
Appreciate the great work the spark community is doing. It is by far the
best thing I have worked on.
..Manas
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com
folder of step2.
Hope this saves some time for some one who has the similar problem.
..Manas
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Shark-on-cloudera-CDH5-error-tp5226p5374.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
:)
In doing so I don't want to push the parsed data to disk and then re-obtain
it via the scala class. Is there a way I can achieve what I want in an
efficient way?
..Manas
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-I-share-RDD-between-a-pyspark-and-spark
as to how to do it easily?
Thanks
Manas
www.exactearth.com[cid:ee_gradient_tm_150wide.png@f20f7501e5a14d6f85ec33629f725228]www.exactearth.com
Manas Kar
Intermediate Software Developer, Product Development | exactEarth Ltd.
60 Struck Ct. Cambridge, Ontario N1R 8L2
office. +1.519.622.4445
27 matches
Mail list logo