Below works for me:
val job = Job.getInstance
val schema = Schema.create(Schema.Type.STRING)
AvroJob.setOutputKeySchema(job, schema)
records.map(item = (new AvroKey[String](item.getGridsumId),
NullWritable.get()))
.saveAsNewAPIHadoopFile(args(1),
Hi Michael,
Thanks for your reply. Is this the correct way to load data from Spark
into Parquet? Somehow it doesn't feel right. When we followed the steps
described for storing the data into Hive tables everything was smooth, we
used HiveContext and the table is automatically recognised by
Hi,
I am a complete newbie to spark and map reduce frameworks and have a basic
question on the reduce function. I was working on the word count example
and was stuck at the reduce stage where the sum happens.
I am trying to understand the working of the reducebykey in Spark using
java as the
YES! This worked! thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-write-spark-RDD-to-Avro-files-tp10947p11245.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I think your questions revolve around the reduce function here, which
is a function of 2 arguments returning 1, whereas in a Reducer, you
implement a function of many-to-many.
This API is simpler if less general. Here you provide an associative
operation that can reduce any 2 values down to 1
Excellent, thank you!
On Sat, Aug 2, 2014 at 4:46 AM, Aaron Davidson ilike...@gmail.com wrote:
Ah, that's unfortunate, that definitely should be added. Using a
pyspark-internal method, you could try something like
javaIterator = rdd._jrdd.toLocalIterator()
it =
Hi Team,
Could you please help me to resolve above compilation issue.
Regards,
Rajesh
On Sat, Aug 2, 2014 at 2:02 AM, Madabhattula Rajesh Kumar
mrajaf...@gmail.com wrote:
Hi Team,
I'm not able to print the values from Spark Sql JavaSchemaRDD. Please find
below my code
Hi Rajesh,
Can you recheck the version and your code again?
I tried similar below code and its work fine (compiles and executes)...
// Apply a schema to an RDD of Java Beans and register it as a table.
JavaSchemaRDD schemaPeople = sqlCtx.applySchema(people, Person.class);
I noticed misspelling in compilation error (extra letter 'a'):
new Function*a*
But in your code the spelling was right.
A bit confused.
On Fri, Aug 1, 2014 at 1:32 PM, Madabhattula Rajesh Kumar
mrajaf...@gmail.com wrote:
Hi Team,
I'm not able to print the values from Spark Sql
Hi,
I am running Spark in a single node cluster. I am able to run the codes in
Spark like SparkPageRank.scala, SparkKMeans.scala by the following command,
bin/run-examples org.apache.spark.examples.SparkPageRank and the required
things
Now, I want to run the Pagerank.scala that is there in GraphX.
Try this:
./bin/run-example graphx.LiveJournalPageRank edge_list_file…
On Aug 2, 2014, at 5:55 PM, Deep Pradhan pradhandeep1...@gmail.com wrote:
Hi,
I am running Spark in a single node cluster. I am able to run the codes in
Spark like SparkPageRank.scala, SparkKMeans.scala by the following
Hi,
I just implemented our algorithm(like personalised pagerank) using Pregel api,
and it seems works well.
But I am thinking of if I can compute only some selected vertexes(hubs), not to
do update on every vertex…
is it possible to do this using Pregel API?
or, more realistically, only
Hi,
I have implemented a Low Level Kafka Consumer for Spark Streaming using
Kafka Simple Consumer API. This API will give better control over the Kafka
offset management and recovery from failures. As the present Spark
KafkaUtils uses HighLevel Kafka Consumer API, I wanted to have a better
At 2014-08-02 21:29:33 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote:
How should I run graphx codes?
At the moment it's a little more complicated to run the GraphX algorithms than
the Spark examples due to SPARK-1986 [1]. There is a driver program in
org.apache.spark.graphx.lib.Analytics
At 2014-08-02 19:04:22 +0200, Yifan LI iamyifa...@gmail.com wrote:
But I am thinking of if I can compute only some selected vertexes(hubs), not
to do update on every vertex…
is it possible to do this using Pregel API?
The Pregel API already only runs vprog on vertices that received messages
I ran into this issue as well. The workaround by copying jar and ivy
manually suggested by Shivaram works for me.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Fri, Aug 1, 2014 at 3:31
I am not a mesos expert... but it sounds like there is some mismatch
between the size that mesos is giving you and the maximum heap size of the
executors (-Xmx).
On Fri, Aug 1, 2014 at 12:07 AM, Gurvinder Singh gurvinder.si...@uninett.no
wrote:
It is not getting out of memory exception. I am
We are investigating various ways to integrate with Tachyon. I'll note
that you can already use saveAsParquetFile and
parquetFile(...).registerAsTable(tableName) (soon to be registerTempTable
in Spark 1.1) to store data into tachyon and query it with Spark SQL.
On Fri, Aug 1, 2014 at 1:42 AM,
The number of partitions (which decides the number of tasks) is fixed after
any shuffle and can be configured using 'spark.sql.shuffle.partitions'
though SQLConf (i.e. sqlContext.set(...) or
SET spark.sql.shuffle.partitions=... in sql) It is possible we will auto
select this based on statistics
19 matches
Mail list logo