I'm using 1.0.4
Thanks,
--
Pei-Lun
On Fri, Mar 27, 2015 at 2:32 PM, Cheng Lian lian.cs@gmail.com wrote:
Hm, which version of Hadoop are you using? Actually there should also be
a _metadata file together with _common_metadata. I was using Hadoop 2.4.1
btw. I'm not sure whether Hadoop
Ok.
I modified as per your suggestions
export SPARK_HOME=/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4
export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.3.0-hadoop2.4.0.jar
export HADOOP_CONF_DIR=/apache/hadoop/conf
cd $SPARK_HOME
./bin/spark-sql -v --driver-class-path
If you can share the stacktrace, then we can give your proper guidelines.
For running on YARN, everything is described here:
https://spark.apache.org/docs/latest/running-on-yarn.html
Thanks
Best Regards
On Fri, Mar 27, 2015 at 8:21 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Hello,
Can
Hi,
Anyone has similar request?
https://issues.apache.org/jira/browse/SPARK-6561
When we save a DataFrame into Parquet files, we also want to have it
partitioned.
The proposed API looks like this:
def saveAsParquet(path: String, partitionColumns: Seq[String])
--
Jianshi Huang
LinkedIn:
What operation are you doing? I'm assuming you have enabled rdd compression
and you are having an empty stream which it tries to uncompress (as seen
from the Exceptions)
Thanks
Best Regards
On Fri, Mar 27, 2015 at 7:15 AM, Chen Song chen.song...@gmail.com wrote:
Using spark 1.3.0 on cdh5.1.0,
Awesome.
Thanks
Best Regards
On Fri, Mar 27, 2015 at 7:26 AM, donhoff_h 165612...@qq.com wrote:
Hi, Akhil
Yes, it's the problem lies in. Thanks very much for point out my mistake.
-- Original --
*From: * Akhil Das;ak...@sigmoidanalytics.com;
*Send time:*
Like this?
val krdd = testrdd.map(x = { try{var key =
val tmp_tocks = x.split(sep1)(0)(key,
x) }catch{ case e: Exception =
println(Exception!! = + e + |||KS1 + x)(null, x)
}})
Thanks
Best Regards
On Thu,
I have few tables that are created in Hive. I wan to transform data stored
in these Hive tables using Spark SQL. Is this even possible ?
So far i have seen that i can create new tables using Spark SQL dialect.
However when i run show tables or do desc hive_table it says table not
found.
I am now
We can set a path, refer to the unit tests. For example:
df.saveAsTable(savedJsonTable, org.apache.spark.sql.json, append, path
=tmpPath)
https://github.com/apache/spark/blob/master/python/pyspark/sql/tests.py
Investigating some more, I found that the table is being created at the
specified
https://issues.apache.org/jira/browse/SPARK-6570
I also left in the call to saveAsParquetFile(), as it produced a similar
exception (though there was no use of explode there).
On Fri, Mar 27, 2015 at 7:20 AM, Cheng Lian lian.cs@gmail.com wrote:
This should be a bug in the Explode.eval(),
Hi all,
We have a workflow that pulls in data from csv files, then originally setup
up of the workflow was to parse the data as it comes in (turn into array),
then store it. This resulted in out of memory errors with larger files (as a
result of increased GC?).
It turns out if the data gets
This should be a bug in the Explode.eval(), which always assumes the
underlying SQL array is represented by a Scala Seq. Would you mind to
open a JIRA ticket for this? Thanks!
Cheng
On 3/27/15 7:00 PM, Jon Chase wrote:
Spark 1.3.0
Two issues:
a) I'm unable to get a lateral view explode
Seems Spark SQL accesses some more columns apart from those created by hive.
You can always recreate the tables, you would need to execute the table
creation scripts but it would be good to avoid recreation.
On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
I did copy
Spark 1.3.0
Two issues:
a) I'm unable to get a lateral view explode query to work on an array type
b) I'm unable to save an array type to a Parquet file
I keep running into this:
java.lang.ClassCastException: [I cannot be cast to
scala.collection.Seq
Here's a stack trace from the
Thanks for the information. Verified that the _common_metadata and
_metadata file are missing in this case when using Hadoop 1.0.4. Would
you mind to open a JIRA for this?
Cheng
On 3/27/15 2:40 PM, Pei-Lun Lee wrote:
I'm using 1.0.4
Thanks,
--
Pei-Lun
On Fri, Mar 27, 2015 at 2:32 PM, Cheng
In our application where we load our historical data in 40 partitioned RDDs
(no. of available cores X 2) and we have not implemented any custom
partitioner.
After applying transformations on these RDDs intermediate RDDs are created
which have partitions greater than 40 and sometimes partitions
I did copy hive-conf.xml form Hive installation into spark-home/conf. IT
does have all the meta store connection details, host, username, passwd,
driver and others.
Snippet
==
configuration
property
namejavax.jdo.option.ConnectionURL/name
It happens only when StorageLevel is used with 1 replica ( StorageLevel.
MEMORY_ONLY_2,StorageLevel.MEMORY_AND_DISK_2) , StorageLevel.MEMORY_ONLY ,
StorageLevel.MEMORY_AND_DISK works - the problems must be clearly somewhere
between mesos-spark . From console I see that spark is trying to replicate
Hi all!
I am trying to install spark on my standalone machine. I am able to run the
master but when i try to run the slaves it gives me following error. Any
help in this regard will highly be appreciated.
_
localhost: failed to launch
Hi,
The behaviour is the same for me in Scala and Python, so posting here in
Python. When I use DataFrame.saveAsTable with the path option, I expect an
external Hive table to be created at the specified path. Specifically, when
I call:
df.saveAsTable(..., path=/tmp/test)
I expect an external
Hi.
In HiveContext, when I put this statement DROP TABLE IF EXISTS TestTable
If TestTable doesn't exist, spark returns an error:
ERROR Hive: NoSuchObjectException(message:default.TestTable table not found)
at
Another follow-up: saveAsTable works as expected when running on hadoop
cluster with Hive installed. It's just locally that I'm getting this
strange behaviour. Any ideas why this is happening?
Kind Regards.
Tom
On 27 March 2015 at 11:29, Tom Walwyn twal...@gmail.com wrote:
We can set a path,
mas mas.ha...@gmail.com writes:
Hi all!
I am trying to install spark on my standalone machine. I am able to run the
master but when i try to run the slaves it gives me following error. Any
help in this regard will highly be appreciated.
Hello Jon,
Are you able to connect to existing Hive and read tables created in hive ?
Regards,
deepak
On Thu, Mar 26, 2015 at 4:16 PM, Jon Chase jon.ch...@gmail.com wrote:
I've filed this as https://issues.apache.org/jira/browse/SPARK-6554
On Thu, Mar 26, 2015 at 6:29 AM, Jon Chase
Did you resolve this ? I am facing the same error
On Wed, Feb 11, 2015 at 1:02 PM, Arush Kharbanda ar...@sigmoidanalytics.com
wrote:
Seems that the HDFS path for the table dosnt contains any file/data.
Does the metastore contain the right path for HDFS data.
You can find the HDFS path in
Since hive and spark SQL internally use HDFS and Hive metastore. The only
thing you want to change is the processing engine. You can try to bring
your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that the hive
site xml captures the metastore connection details).
Its a hack, i havnt
This is exactly my case also, it worked, thanks Sean.
On 26 March 2015 at 23:35, Sean Owen so...@cloudera.com wrote:
You can do this much more simply, I think, with Scala's parallel
collections (try .par). There's nothing wrong with doing this, no.
Here, something is getting caught in your
Hello,
Is there any published community roadmap for SparkSQL and the DataSources
API?
Regards,
Ashish
Forgot to mention that, would you mind to also provide the full stack
trace of the exception thrown in the saveAsParquetFile call? Thanks!
Cheng
On 3/27/15 7:35 PM, Jon Chase wrote:
https://issues.apache.org/jira/browse/SPARK-6570
I also left in the call to saveAsParquetFile(), as it
Hi experts!
I would like to know is there anyway to store schemaRDD to cassandra?
if yes then how to store in existing cassandra column family and new column
family?
Thanks
--
View this message in context:
I can recreate tables but what about data. It looks like this is a obvious
feature that Spark SQL must be having. People will want to transform tons
of data stored in HDFS through Hive from Spark SQL.
Spark programming guide suggests its possible.
Spark SQL also supports reading and writing
More info
when using *spark.mesos.coarse* everything works as expected. I think this
must be a bug in spark-mesos integration.
2015-03-27 9:23 GMT+01:00 Ondrej Smola ondrej.sm...@gmail.com:
It happens only when StorageLevel is used with 1 replica ( StorageLevel.
Hello,
I want to check if there is any way to check the data integrity of the data
files. The use case is perform data integrity check on large files 100+
columns and reject records (write it another file) that does not meet
criteria's (such as NOT NULL, date format, etc). Since there are lot of
Each RDD is composed of multiple blocks known as partitions, when you apply
transformation over it, then it can grow in size depending on the operation
(as the # objects/references increase) and that is probably the reason why
you are seeing increased number of partitions.
I don't think increased
I tried the following
1)
./bin/spark-submit -v --master yarn-cluster --driver-class-path
Done. I also updated the name on the ticket to include both issues.
Spark SQL arrays: explode() fails and cannot save array type to Parquet
https://issues.apache.org/jira/browse/SPARK-6570
On Fri, Mar 27, 2015 at 8:14 AM, Cheng Lian lian.cs@gmail.com wrote:
Forgot to mention that, would
Its not possible to configure Spark to do checks based on xmls. You would
need to write jobs to do the validations you need.
On Fri, Mar 27, 2015 at 5:13 PM, Sathish Kumaran Vairavelu
vsathishkuma...@gmail.com wrote:
Hello,
I want to check if there is any way to check the data integrity of
Show us the code. This shouldn't happen for the simple process you described
Sent from my rotary phone.
On Mar 27, 2015, at 5:47 AM, jamborta jambo...@gmail.com wrote:
Hi all,
We have a workflow that pulls in data from csv files, then originally setup
up of the workflow was to parse
Thanks for the detailed information!
On 3/27/15 9:16 PM, Jon Chase wrote:
Done. I also updated the name on the ticket to include both issues.
Spark SQL arrays: explode() fails and cannot save array type to
Parquet
https://issues.apache.org/jira/browse/SPARK-6570
On Fri, Mar 27, 2015 at
jamborta :
Please also describe the format of your csv files.
Cheers
On Fri, Mar 27, 2015 at 6:42 AM, DW @ Gmail deanwamp...@gmail.com wrote:
Show us the code. This shouldn't happen for the simple process you
described
Sent from my rotary phone.
On Mar 27, 2015, at 5:47 AM, jamborta
Ankur,
The JavaKinesisWordCountASLYARN is no longer valid and was added just to the
EMR build back in 1.1.0 to demonstrate Spark Streaming with Kinesis in YARN,
just follow the stock example as seen in JavaKinesisWordCountASL as it is
better form anyway given it is best not to hard code the
This will be fixed in https://github.com/apache/spark/pull/5230/files
On Fri, Mar 27, 2015 at 9:13 AM, Peter Mac peter.machar...@noaa.gov wrote:
I downloaded spark version spark-1.3.0-bin-hadoop2.4.
When the python version of sql.py is run the following error occurs:
[root@nde-dev8-template
I have a very strange error in Spark 1.3 where at runtime in the
org.apache.spark.ui.JettyUtils object the method createServletHandler is not
found
Exception in thread main java.lang.NoSuchMethodError:
Hi,
Now I have 10 edge data files in my HDFS directory, e.g. edges_part00,
edges_part01, …, edges_part09
format: srcId tarId
(They make a good partitioning of that whole graph, so I never expect any
change(re-partitoning operations) on them during graph building).
I am thinking of how to
Hello, I 'm trying to develop with the new Dataframe API, but I'm
running into
an error.
I have an existing MySQL database and I want to insert rows.
I create a Dataframe from an RDD, then use the insertIntoJDBC function.
It appear that dataframes reorder the data inside them.
As a result, I
I checked the ports using netstat and don't see any connections established
on that port. Logs show only this:
15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount
15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID
app-20150327135048-0002
Spark ui shows:
Running
JettyUtils is marked with:
private[spark] object JettyUtils extends Logging {
FYI
On Fri, Mar 27, 2015 at 9:50 AM, kmader kevin.ma...@gmail.com wrote:
I have a very strange error in Spark 1.3 where at runtime in the
org.apache.spark.ui.JettyUtils object the method createServletHandler is
not
Hi,
I am just running this simple example with
machineA: 1 master + 1 worker
machineB: 1 worker
«
val ssc = new StreamingContext(sparkConf, Duration(1000))
val rawStreams = (1 to numStreams).map(_
=ssc.rawSocketStream[String](host, port,
StorageLevel.MEMORY_ONLY_SER)).toArray
val
If it is deterministically reproducible, could you generate full DEBUG
level logs, from the driver and the workers and give it to me? Basically I
want to trace through what is happening to the block that is not being
found.
And can you tell what Cluster manager are you using? Spark Standalone,
Hi,
I have a simple Spark application: it creates an input rdd with
sc.textfile, and it calls flatMapToPair, reduceByKey and map on it. The
output rdd is small, a few MB's. Then I call collect() on the output.
If the textfile is ~50GB, it finishes in a few minutes. However, if it's
larger
Hi Kelvin,
Thank you. That works for me. I wrote my own joins that produced Scala
collections, instead of using rdd.join.
Regards,
Yang
On Thu, Mar 26, 2015 at 5:51 PM, Kelvin Chu 2dot7kel...@gmail.com wrote:
Hi, I used union() before and yes it may be slow sometimes. I _guess_ your
variable
It is just a comma separated file, about 10 columns wide which we append
with a unique id and a few additional values.
On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu yuzhih...@gmail.com wrote:
jamborta :
Please also describe the format of your csv files.
Cheers
On Fri, Mar 27, 2015 at 6:42 AM, DW
Upon reviewing your other thread, could you confirm that your Hive
metastore that you can connect to via Hive is a MySQL database? And to
also confirm, when you're running spark-shell and doing a show tables
statement, you're getting the same error?
On Fri, Mar 27, 2015 at 6:08 AM ÐΞ€ρ@Ҝ (๏̯͡๏)
I downloaded spark version spark-1.3.0-bin-hadoop2.4.
When the python version of sql.py is run the following error occurs:
[root@nde-dev8-template python]#
/root/spark-1.3.0-bin-hadoop2.4/bin/spark-submit sql.py
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
I ran a spark streaming job.
100 executors
30G heap per executor
4 cores per executor
The version I used is 1.3.0-cdh5.1.0.
The job is reading from a directory on HDFS (with files incoming
continuously) and does some join on the data. I set batch interval to be 15
minutes and the job worked
Hallo,
Well all problems you want to solve with technology need to have good
justification for a certain technology. So the first thing is that you ask
which technology fits to my current and future problems. This is also what
the article says. Unfortunately, it does only provide a vague answer
Yes, only when using fine grained mode and replication
(StorageLevel.MEMORY_ONLY_2
etc).
2015-03-27 19:06 GMT+01:00 Tathagata Das t...@databricks.com:
Does it fail with just Spark jobs (using storage levels) on non-coarse
mode?
TD
On Fri, Mar 27, 2015 at 4:39 AM, Ondrej Smola
Do you have the logs of the driver? Does that give any exceptions?
TD
On Fri, Mar 27, 2015 at 12:24 PM, Chen Song chen.song...@gmail.com wrote:
I ran a spark streaming job.
100 executors
30G heap per executor
4 cores per executor
The version I used is 1.3.0-cdh5.1.0.
The job is reading
Are you running on yarn?
- If you are running in yarn-client mode, set HADOOP_CONF_DIR to
/etc/hive/conf/ (or the directory where your hive-site.xml is located).
- If you are running in yarn-cluster mode, the easiest thing to do is to
add--files=/etc/hive/conf/hive-site.xml (or the path for
Hi Spark group,
We haven't been able to find clear descriptions of how Spark handles the
resiliency of RDDs in relationship to executing actions with side-effects.
If you do an `rdd.foreach(someSideEffect)`, then you are doing a
side-effect for each element in the RDD. If a partition goes down --
If you invoke this, you will get at-least-once semantics on failure.
For instance, if a machine dies in the middle of executing the foreach
for a single partition, that will be re-executed on another machine.
It could even fully complete on one machine, but the machine dies
immediately before
Can you try specifying the number of partitions when you load the data to
equal the number of executors? If your ETL changes the number of
partitions, you can also repartition before calling KMeans.
On Thu, Mar 26, 2015 at 8:04 PM, Xi Shen davidshe...@gmail.com wrote:
Hi,
I have a large
Yes, I could recompile the hdfs client with more logging, but I don’t have the
day or two to spare right this week.
One more thing about this, the cluster is Horton Works 2.1.3 [.0]
They seem to have a claim of supporting spark on Horton Works 2.2
Dale.
From: Ted Yu
This is a PR in review to support ORC via the SQL data source API:
https://github.com/apache/spark/pull/3753. You can try pulling that PR
and help test it. -Xiangrui
On Wed, Mar 25, 2015 at 5:03 AM, Zsolt Tóth toth.zsolt@gmail.com wrote:
Hi,
I use sc.hadoopFile(directory,
Hi Martin,
Could you attach the code snippet and the stack trace? The default
implementation of some methods uses reflection, which may be the
cause.
Best,
Xiangrui
On Wed, Mar 25, 2015 at 3:18 PM, zapletal-mar...@email.cz wrote:
Thanks Peter,
I ended up doing something similar. I however
This sounds like a bug ... Did you try a different lambda? It would be
great if you can share your dataset or re-produce this issue on the
public dataset. Thanks! -Xiangrui
On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody rmody...@gmail.com wrote:
After upgrading to 1.3.0, ALS.trainImplicit() has been
Remember that article that went viral on HN? (Where a guy showed how GraphX
/ Giraph / GraphLab / Spark have worse performance on a 128 cluster than on
a 1 thread machine? if not here is the article -
http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html)
Well as you may
While looking into a issue, I noticed that the source displayed on Github
site does not matches the downloaded tar for 1.3
Thoughts ?
The source code should match the Spark commit
4aaf48d46d13129f0f9bdafd771dd80fe568a7dc. Do you see any differences?
On Fri, Mar 27, 2015 at 11:28 AM, Manoj Samel manojsamelt...@gmail.com wrote:
While looking into a issue, I noticed that the source displayed on Github
site does not matches the
Does it fail with just Spark jobs (using storage levels) on non-coarse mode?
TD
On Fri, Mar 27, 2015 at 4:39 AM, Ondrej Smola ondrej.sm...@gmail.com
wrote:
More info
when using *spark.mesos.coarse* everything works as expected. I think
this must be a bug in spark-mesos integration.
Hi Martin,
In the short term: Would you be able to work with a different type other
than Vector? If so, then you can override the *Predictor* class's *protected
def featuresDataType: DataType* with a DataFrame type which fits your
purpose. If you need Vector, then you might have to do a hack
Hi everyone,
I had a lot of questions today, sorry if I'm spamming the list, but I
thought it's better than posting all questions in one thread. Let me know
if I should throttle my posts ;)
Here is my question:
When I try to have a case class that has Any in it (e.g. I have a property
map and
Hi All,
I am running a spark cluster on EC2 instances of type: m3.2xlarge. I have
given 26gb of memory with all 8 cores to my executors. I can see that in
the logs too:
*15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added:
app-20150327213106-/0 on
Seems like a bug, could you file a JIRA?
@Tim: Patrick said you take a look at Mesos related issues. Could you take
a look at this. Thanks!
TD
On Fri, Mar 27, 2015 at 1:25 PM, Ondrej Smola ondrej.sm...@gmail.com
wrote:
Yes, only when using fine grained mode and replication
Hello,
I am using the Spark shell in Scala on the localhost. I am using sc.textFile
to read a directory. The directory looks like this (generated by another
Spark script):
part-0
part-1
_SUCCESS
The part-0 has four short lines of text while part-1 has two short
lines of text.
Yes, it works for me. Make sure the Spark machine can access the hive
machine.
On Thu, Mar 26, 2015 at 6:55 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Did you manage to connect to Hive metastore from Spark SQL. I copied hive
conf file into Spark conf folder but when i run show tables, or do
I looked @ the 1.3.0 code and figured where this can be added
In org.apache.spark.deploy.yarn ApplicationMaster.scala:282 is
actorSystem = AkkaUtils.createActorSystem(sparkYarnAM,
Utils.localHostName, 0,
conf = sparkConf, securityManager = securityMgr)._1
If I change it to below,
Hi Rares,
The number of partition is controlled by HDFS input format, and one file may
have multiple partitions if it consists of multiple block. In you case, I think
there is one file with 2 splits.
Thanks.
Zhan Zhang
On Mar 27, 2015, at 3:12 PM, Rares Vernica
Probably guava version conflicts issue. What spark version did you use, and
which hadoop version it compile against?
Thanks.
Zhan Zhang
On Mar 27, 2015, at 12:13 PM, Johnson, Dale
daljohn...@ebay.commailto:daljohn...@ebay.com wrote:
Yes, I could recompile the hdfs client with more logging,
The files sound too small to be 2 blocks in HDFS.
Did you set the defaultParallelism to be 3 in your spark?
Yong
Subject: Re: 2 input paths generate 3 partitions
From: zzh...@hortonworks.com
To: rvern...@gmail.com
CC: user@spark.apache.org
Date: Fri, 27 Mar 2015 23:15:38 +
Hi Rares,
I want to use ARIMA for a predictive model so that I can take time series
data (metrics) and perform a light anomaly detection. The time series data
is going to be bucketed to different time units (several minutes within
several hours, several hours within several days, several days within
several
I am working with the mllib.optimization.GradientDescent class and I'm
confused about how to set a custom loss function with setGradient?
For instance, if I wanted my loss function to be x^2 how would I go about
setting it using setGradient?
--
View this message in context:
Yes, I have done repartition.
I tried to repartition to the number of cores in my cluster. Not helping...
I tried to repartition to the number of centroids (k value). Not helping...
On Sat, Mar 28, 2015 at 7:27 AM Joseph Bradley jos...@databricks.com
wrote:
Can you try specifying the number
JIRA ticket created at:
https://issues.apache.org/jira/browse/SPARK-6581
Thanks,
--
Pei-Lun
On Fri, Mar 27, 2015 at 7:03 PM, Cheng Lian lian.cs@gmail.com wrote:
Thanks for the information. Verified that the _common_metadata and
_metadata file are missing in this case when using Hadoop
Hi,
I am not using HDFS, I am using the local file system. Moreover, I did not
modify the defaultParallelism. The Spark instance is the default one
started by Spark Shell.
Thanks!
Rares
On Fri, Mar 27, 2015 at 4:48 PM, java8964 java8...@hotmail.com wrote:
The files sound too small to be 2
Hi I am following the instruction on this website.
http://www.infoobjects.com/spark-with-avro/
I installed the sparkavro libary on https://github.com/databricks/spark-avro
on a machine which only has hive gateway client role on a hadoop cluster.
somehow I got error on reading the avro file.
(I bet the Spark implementation could be improved. I bet GraphX could
be optimized.)
Not sure about this one, but in core benchmarks often start by
assuming that the data is local. In the real world, data is unlikely
to be. The benchmark has to include the cost of bringing all the data
to the
I have increased the spark.storage.memoryFraction to 0.4 but I still get
OOM errors on Spark Executor nodes
15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block
broadcast_5_piece10
15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5 took
2704 ms
15/03/27 23:19:52
never mind. find my spark is still 1.2 but the avro library requires 1.3.
will try again.
On Fri, Mar 27, 2015 at 9:38 PM, Joanne Contact joannenetw...@gmail.com
wrote:
Hi I am following the instruction on this website.
http://www.infoobjects.com/spark-with-avro/
I installed the sparkavro
spark version is 1.3.0 with tanhyon-0.6.1
QUESTION DESCRIPTION: rdd.saveAsObjectFile(tachyon://host:19998/test) and
rdd.saveAsTextFile(tachyon://host:19998/test) succeed, but
rdd.toDF().saveAsParquetFile(tachyon://host:19998/test) failure.
ERROR MESSAGE:
90 matches
Mail list logo