an error occured when write parquet files to disk.
any advise?
I want to know the reason.thanks
```
16/03/29 18:31:48 WARN hadoop.ParquetOutputCommitter: could not write
summary file for file:/tmp/goods/2015-6
java.lang.NullPointerException
at
I think too many parquet files may be affect reading capability,so I use
hadoop archive to combine them,but sql_context.read.parquet(output_path)
does not work on the file.
How to fix it ,please help me.
:)
It works. at spark 1.4
Thanks a lot.
2015-09-09 17:21 GMT+08:00 Cheng Lian <lian.cs@gmail.com>:
> You need to use "har://" instead of "hdfs://" to read HAR files. Just
> tested against Spark 1.5, and it works as expected.
>
> Cheng
>
>
> On
Hi,everyone.
I have one question in loading data using spark datasource api and using
jdbc that which way is effective?
What is the differents of loading data using jdbc and loading data using
spard data source api?
or differents of loading data using mongo-hadoop and loading data using
native java driver?
Which way is better?
Did you have a change of the value of 'spark.default.parallelism'?be a
bigger number.
2015-06-05 17:56 GMT+08:00 Evo Eftimov evo.efti...@isecc.com:
It may be that your system runs out of resources (ie 174 is the ceiling)
due to the following
1. RDD Partition = (Spark) Task
2.
just multiply 2-4 with the cpu core number of the node .
2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com:
I did not change spark.default.parallelism,
What is recommended value for it.
On Fri, Jun 5, 2015 at 3:31 PM, 李铖 lidali...@gmail.com wrote:
Did you have a change
In my dev-test env .I have 3 virtual machines ,every machine have 12G
memory,8 cpu core.
Here is spark-defaults.conf,and spark-env.sh.Maybe some config is not right.
I run this command :*spark-submit --master yarn-client --driver-memory 7g
--executor-memory 6g /home/hadoop/spark/main.py*
Any help?please.
Help me do a right configure.
李铖 lidali...@gmail.com于2015年4月7日星期二写道:
In my dev-test env .I have 3 virtual machines ,every machine have 12G
memory,8 cpu core.
Here is spark-defaults.conf,and spark-env.sh.Maybe some config is not
right.
I run this command :*spark-submit
Again,when I do larger file Spark-sql query, error occured.Anyone have got
fix it .Please help me.
Here is the track.
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location for shuffle 0
at
172.100.11.25080:71:7a:95:48:a21364021秒
2015-03-26 23:01 GMT+08:00 Michael Armbrust mich...@databricks.com:
I would suggest looking for errors in the logs of your executors.
On Thu, Mar 26, 2015 at 3:20 AM, 李铖 lidali...@gmail.com wrote:
Again,when I do larger file Spark-sql query, error
It is ok when I do query data from a small hdfs file.
But if the hdfs file is 152m,I got this exception.
I try this code
.'sc.setSystemProperty(spark.kryoserializer.buffer.mb,'256')'.error
still.
```
com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
required: 39135
at
the 2nd one is larger (it seems that Kryo doesn’t check for it).
Cheng
On 3/25/15 7:31 PM, 李铖 wrote:
Here is the full track
15/03/25 17:48:34 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID
1, cloud1): com.esotericsoftware.kryo.KryoException: Buffer overflow.
Available: 0, required
the full stack trace?
On 3/25/15 6:26 PM, 李铖 wrote:
It is ok when I do query data from a small hdfs file.
But if the hdfs file is 152m,I got this exception.
I try this code
.'sc.setSystemProperty(spark.kryoserializer.buffer.mb,'256')'.error
still.
```
com.esotericsoftware.kryo.KryoException
$WriterThread.run(PythonRDD.scala:203)
2015-03-26 10:39 GMT+08:00 李铖 lidali...@gmail.com:
Yes, it works after I append the two properties in spark-defaults.conf.
As I use python programing on spark platform,the python api does not have
SparkConf api.
Thanks.
2015-03-25 21:07 GMT+08:00 Cheng Lian
Hi,everybody.
I am new in spark. Now I want to do interactive sql query using spark sql.
spark sql can run under hive or loading files from hdfs.
Which is better or faster?
Thanks.
Hi,everybody.
I am new in spark. Now I want to do interactive sql query using spark sql.
spark sql can run under hive or loading files from hdfs.
Which is better or faster?
Thanks.
. Even hive tables are read from
files HDFS usually.
You probably should use HiveContext as its query language is more powerful
than SQLContext. Also, parquet is usually the faster data format for Spark
SQL.
On Tue, Mar 17, 2015 at 3:41 AM, 李铖 lidali...@gmail.com wrote:
Hi,everybody.
I am
18 matches
Mail list logo