Hi,
Everyone. I have a piece of following code. When I run it,
it occurred the error just like below, it seem that the SparkContext is not
serializable, but i do not try to use the SparkContext except the broadcast.
[In fact, this code is in the MLLib, I just try to broadcast the
Hello,
I write code to practice Spark Sql based on latest Spark version.
But I get compilation error as following, seems the implicit conversion
from RDD to SchemaRDD doesn't
work. If anybody can help me to fix it. Thanks a lot.
value registerAsTable is not a member of
I see Spark is using AvroRecordReaderBase, which is used to grab Avro
Container Files, which is different from Sequence Files. If anyone is using
Avro Sequence Files with success and has an example, please let me know.
--
View this message in context:
To be more specific, I'm working with a system that stores data in
org.apache.avro.hadoop.io.AvroSequenceFile format. An AvroSequenceFile is
A wrapper around a Hadoop SequenceFile that also supports reading and
writing Avro data.
It seems that Spark does not support this out of the box.
--
I got this working locally a little while ago when playing around with
AvroKeyInputFile: https://gist.github.com/MLnick/5864741781b9340cb211
But not sure about AvroSequenceFile. Any chance you have an example
datafile or records?
On Sat, Jul 19, 2014 at 11:00 AM, Sparky gullo_tho...@bah.com
Thanks for the gist. I'm just now learning about Avro. I think when you use
a DataFileWriter you are writing to an Avro Container (which is different
than an Avro Sequence File). I have a system where data was written to an
HDFS Sequence File using AvroSequenceFile.Writer (which is a wrapper
Hi Guys,
I try to create spark uber jar with sbt but I have a lot of problem... I
want to use the following:
- Spark streaming
- Kafka
- Elsaticsearch
- HBase
the current jar size is cca 60M and it's not working.
- When I deploy with spark-submit: It's running and exit without any error
- When I
Are you building / running with Java 6? I imagine your .jar files has
more than 65536 files, and Java 6 has various issues with jars this
large. If possible, use Java 7 everywhere.
https://issues.apache.org/jira/browse/SPARK-1520
On Sat, Jul 19, 2014 at 2:30 PM, boci boci.b...@gmail.com wrote:
Hi!
I using java7, I found the problem. I not run start and await termination
on streaming context, now it's work BUT
spark-submit never return (it's run in the foreground and receive the kafka
streams)... what I miss?
(I want to send the job to standalone cluster worker process)
b0c1
Can position be null? Looks like there may be constraints with predicate push
down in that case. https://github.com/apache/spark/pull/511/
On Jul 18, 2014, at 8:04 PM, Christos Kozanitis kozani...@berkeley.edu
wrote:
Hello
What is the order with which SparkSQL deserializes parquet
Hi,
I have a file called out with random numbers where each number in on one
line in the file. I am loading the complete file into a RDD and I want to
create partitions with the help of coalesce function.
This is my code snippet.
import scala.math.Ordered
import org.apache.spark.rdd.CoalescedRDD
Hi,
You can try setting the heap space memory to a higher value.
Are you using an Ubuntu machine?
In bashrc set the following option.
export _JAVA_OPTIONS=-Xmx2g
This should set your heap size to a higher value.
Regards,
Madhura
--
View this message in context:
Can you provide the code? Is Record a case class? and is it defined as a
top level object? Also have you done import sqlContext._?
On Sat, Jul 19, 2014 at 3:39 AM, junius junius.z...@gmail.com wrote:
Hello,
I write code to practice Spark Sql based on latest Spark version.
But I get
HI Experts,
Could you please help me in getting some insights about doing realtime
segmentation ( Segmentation on demand )
Using spark .
My use case is like this .
1) I am running a campaign
2) Customers are subscribing for the campaign
3) Campaign is for 2-3 hours
4) Estimated target
Hi Sean,
I was launching the Spark Streaming program from Eclipse, but now I'm
running it with the spark-submit script from the Spark distribution for
CDH4 at http://spark.apache.org/downloads.html, and it works just fine.
Thanks a lot for your help,
Greetings,
Juan
2014-07-16 12:58
Hi,
I am working with a small dataset about 13Mbyte on the spark-shell. After
doing a
groupBy on the RDD, I wanted to cache RDD in memory but I keep getting
these warnings:
scala rdd.cache()
res28: rdd.type = MappedRDD[63] at repartition at console:28
scala rdd.count()
14/07/19 12:45:18 WARN
Thanks for the reply.
I am trying to save a huge file in my case it is 60GB. I think l.toSeq is
going to collect all the data into the driver , where I don't have that much
space . Is there any possibility using something like multipleoutput format
class etc for a large file.
Thanks,
Durga.
Hello,
I get a lot of these exceptions on my mesos cluster when running spark jobs:
14/07/19 16:29:43 WARN spark.network.SendingConnection: Error finishing
connection to prd-atl-mesos-slave-010/10.88.160.200:37586
java.net.ConnectException: Connection timed out
at
Hi guys!
I run out of ideas... I created a spark streaming job (kafka - spark -
ES).
If I start my app local machine (inside the editor, but connect to the real
kafka and ES) the application work correctly.
If I start it in my docker container (same kafka and ES, local mode
(local[4]) like inside
I am a newbie and am looking for pointers to start debugging my spark app and
did not find a straightforward tutorial. Any help is appreciated?
Sent from my iPhone
160G parquet files (ca. 30 files, snappy compressed, made by cloudera impala)
ca. 30 full table scan, took 3-5 columns out, then some normal scala
operations like substring, groupby, filter, at the end, save as file in HDFS
yarn-client mode, 23 core and 60G mem / node
but, always failed !
I'm still having trouble with this one.
Watching it, I've noticed that the first time around, the task size is
large, but not terrible (199KB). It's on the second iteration of the
optimization that the task size goes crazy (120MB).
Does anybody have any ideas why this might be happening? Is there
Can you attach your code?
Thanks,
Yin
On Sat, Jul 19, 2014 at 4:10 PM, chutium teng@gmail.com wrote:
160G parquet files (ca. 30 files, snappy compressed, made by cloudera
impala)
ca. 30 full table scan, took 3-5 columns out, then some normal scala
operations like substring, groupby,
Hi all
Have anyone encounter such problem below, and how to solve it ? any help
would be appreciated.
Caused by: java.lang.UnsatisfiedLinkError:
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at
Thanks Eric. That is the case as most of my fields are optional. So it
seems that the problem comes from Parquet.
On Sat, Jul 19, 2014 at 8:27 AM, Eric Friedman eric.d.fried...@gmail.com
wrote:
Can position be null? Looks like there may be constraints with predicate
push down in that case.
Could you collect debug level logs and send us. Without logs its hard to
speculate anything. :)
TD
On Sat, Jul 19, 2014 at 2:39 PM, boci boci.b...@gmail.com wrote:
Hi guys!
I run out of ideas... I created a spark streaming job (kafka - spark -
ES).
If I start my app local machine (inside
Probably you have - if not, try a very simple app in the docker container
and make sure it works. Sometimes resource contention/allocation can get in
the way. This happened to me in the YARN container.
Also try single worker thread.
Cheers
k/
On Sat, Jul 19, 2014 at 2:39 PM, boci
I compiled spark 1.0.1 with 2.3.0cdh5.0.2 today...
No issues with mvn compilation but my sbt build keeps failing on the sql
module...
I just saw that my scala is at 2.11.0 (with brew update)...not sure if
that's why the sbt compilation is failing...retrying..
On Sat, Jul 19, 2014 at 6:16 PM,
28 matches
Mail list logo