wrong.
This is especially true if the people you think are wrong are actually correct.
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
> On Aug 2, 2017, at 6:25 AM, Gourav Sengupta <gourav.sengu...@gmail.com> wrote:
>
> Hi,
>
> I am definitely sure
<— core is java
https://github.com/hail-is/hail <https://github.com/hail-is/hail> <— core is
scala, mostly used through python wrappers
neuroscience:
https://github.com/thunder-project/thunder#using-with-spark
<https://github.com/thunder-project/thunder#using-with-spark> <—
Hard to say with #1 without knowing your application’s characteristics; for #2,
we use conductor <https://github.com/BD2KGenomics/conductor> with IAM roles,
.boto/.aws/credentials files.
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
> On Mar 15, 20
if we support that right now for feature data;
those are fairly new.
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Jun 9, 2015, at 9:21 PM, roni roni.epi...@gmail.com wrote:
Hi Frank,
Thanks for the reply. I downloaded ADAM and built
with ADAMContext.loadFeatures. We have two tools for the
overlap computation: you can use a BroadcastRegionJoin if one of the datasets
you want to overlap is small or a ShuffleRegionJoin if both datasets are large.
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Jun 8
You’ll definitely want to use a Kryo-based serializer for Avro. We have a Kryo
based serializer that wraps the Avro efficient serializer here.
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Apr 3, 2015, at 5:41 AM, Akhil Das ak...@sigmoidanalytics.com
with more resources (memory capacity and bandwidth, and disk bandwidth).
When you increase the number of tasks executing on a single node, you do not
increase the pool of available resources.
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Feb 21, 2015, at 4:11
Unless I misunderstood your question, you’re looking for the val clusterCenters
in
http://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.mllib.clustering.KMeansModel,
no?
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Feb 5, 2015, at 2
, I would be glad to port it for the Spark core.
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Jan 22, 2015, at 7:11 AM, Venkat, Ankam ankam.ven...@centurylink.com wrote:
Thanks Frank for your response.
So, creating a custom InputFormat
on in that project, so I’m not sure if it is the
cleanest way, but it is a workable way.
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Jan 21, 2015, at 9:17 AM, Venkat, Ankam ankam.ven...@centurylink.com wrote:
I am trying to solve similar problem. I am
Shailesh,
To add, are you packaging Hadoop in your app? Hadoop will pull in Guava. Not
sure if you are using Maven (or what) to build, but if you can pull up your
builds dependency tree, you will likely find com.google.guava being brought in
by one of your dependencies.
Regards,
Frank Austin
an Avro HadoopInputFormat.
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Nov 5, 2014, at 1:25 PM, Simone Franzini captainfr...@gmail.com wrote:
How can I read/write AVRO specific records?
I found several snippets using generic records, but nothing
) will be added in Spark 1.0.1 (which is currently available, BTW).
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Sep 26, 2014, at 7:38 AM, matthes mdiekst...@sensenetworks.com wrote:
Thank you Jey,
That is a nice introduction but it is a may be to old
Matthes,
Ah, gotcha! Repeated items in Parquet seem to correspond to the ArrayType in
Spark-SQL. I only use Spark, but it does looks like that should be supported in
Spark-SQL 1.1.0. I’m not sure though if you can apply predicates on repeated
items from Spark-SQL.
Regards,
Frank Austin
Hi Mohan,
It’s a bit convoluted to follow in their source, but they essentially typedef
KSerializer as being a KryoSerializer, and then their serializers all extend
KSerializer. Spark should identify them properly as Kryo Serializers, but I
haven’t tried it myself.
Regards,
Frank Austin
for local file
access. This is used to implement the rdd.pipe method (IIRC), and we use it in
some downstream apps to do IO with processes that we spawn from mapPartitions
calls (see here and here).
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Sep 8, 2014, at 12:28 AM, Tomer Benyamini tomer@gmail.com wrote:
~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh
Hi Zhen,
The Scala iterator trait supports cloning via the duplicate method
(http://www.scala-lang.org/api/current/index.html#scala.collection.Iterator@duplicate:(Iterator[A],Iterator[A])).
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Jun 13
Robert,
You can build a Spark application using Maven for Hadoop 2 by adding a
dependency on the Hadoop 2.* hadoop-client package. If you define any
Hadoop Input/Output formats, you may also need to depend on the
hadoop-mapreduce package.
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Jun 29, 2014, at 4:20 PM, Robert James srobertja...@gmail.com wrote:
On 6/29/14, FRANK AUSTIN NOTHAFT fnoth...@berkeley.edu wrote:
Robert,
You can build a Spark application using Maven for Hadoop 2 by adding a
dependency
/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMRDDFunctions.scala,
starting at line 62. There is a bit of setup necessary for the Parquet
write codec, but otherwise it is fairly straightforward.
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202
request; I haven't seen that on my end.
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia buendia...@gmail.comwrote:
Hi,
Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many
errors
about this at
http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/.
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Thu, Apr 3, 2014 at 7:16 AM, Ian O'Connell i...@ianoconnell.com wrote:
Objects been transformed need to be one of these in flight
23 matches
Mail list logo