Hi guys,
Does this issue affect 1.2.0 only or all previous releases as well?
Best Regards,
Jerry
On Thu, Jan 8, 2015 at 1:40 AM, Xuelin Cao xuelincao2...@gmail.com wrote:
Yes, the problem is, I've turned the flag on.
One possible reason for this is, the parquet file supports predicate
Hi guys,
I'm interested in the IndexedRDD too.
How many rows in the big table that matches the small table in every run?
If the number of rows stay constant, then I think Jem wants the runtime to
stay about constant (i.e. ~ 0.6 second for all cases). However, I agree
with Andrew. The performance
. There
is parquet-mr project that uses hadoop to do so. I am trying to write a
spark job to do similar kind of thing.
On Fri, Jan 9, 2015 at 3:20 AM, Jerry Lam chiling...@gmail.com wrote:
Hi spark users,
I'm using spark SQL to create parquet files on HDFS. I would like to
store the avro schema
Hi spark users,
I'm using spark SQL to create parquet files on HDFS. I would like to store
the avro schema into the parquet meta so that non spark sql applications
can marshall the data without avro schema using the avro parquet reader.
Currently, schemaRDD.saveAsParquetFile does not allow to do
Hi spark developers,
I was thinking it would be nice to extract the data lineage information
from a data processing pipeline. I assume that spark/tachyon keeps this
information somewhere. For instance, a data processing pipeline uses
datasource A and B to produce C. C is then used by another
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I would like to use the schema in the schemaRDD (rdd_table) when I create
the external table.
For example:
AFAIK.
On Fri, Dec 19, 2014 at 2:22 AM, Jerry Lam chiling...@gmail.com wrote:
Hi Spark users,
I wonder if val resultRDD = RDDA.union(RDDB) will always have records in
RDDA before records in RDDB.
Also, will resultRDD.coalesce(1) change this ordering?
Best Regards,
Jerry
Hi Spark users,
I wonder if val resultRDD = RDDA.union(RDDB) will always have records in
RDDA before records in RDDB.
Also, will resultRDD.coalesce(1) change this ordering?
Best Regards,
Jerry
Hi spark users,
Do you know how to read json files using Spark SQL that are LZO compressed?
I'm looking into sqlContext.jsonFile but I don't know how to configure it
to read lzo files.
Best Regards,
Jerry
, Dec 17, 2014 at 11:27 AM, Ted Yu yuzhih...@gmail.com wrote:
See this thread: http://search-hadoop.com/m/JW1q5HAuFv
which references https://issues.apache.org/jira/browse/SPARK-2394
Cheers
On Wed, Dec 17, 2014 at 8:21 AM, Jerry Lam chiling...@gmail.com wrote:
Hi spark users,
Do you know
at 8:33 AM, Jerry Lam chiling...@gmail.com wrote:
Hi Ted,
Thanks for your help.
I'm able to read lzo files using sparkContext.newAPIHadoopFile but I
couldn't do the same for sqlContext because sqlContext.josnFile does not
provide ways to configure the input file format. Do you know
Hi spark users,
Do you know how to access rows of row?
I have a SchemaRDD called user and register it as a table with the
following schema:
root
|-- user_id: string (nullable = true)
|-- item: array (nullable = true)
||-- element: struct (containsNull = false)
|||-- item_id:
== 1 }
res0: Int = 1
...else:
scala items.count { case (user_id, name) = user_id == 1 }
res1: Int = 1
On Mon, Dec 15, 2014 at 11:04 AM, Jerry Lam chiling...@gmail.com wrote:
Hi spark users,
Do you know how to access rows of row?
I have a SchemaRDD called user and register
Hi spark users,
I'm trying to filter a json file that has the following schema using Spark
SQL:
root
|-- user_id: string (nullable = true)
|-- item: array (nullable = true)
||-- element: struct (containsNull = false)
|||-- item_id: string (nullable = true)
|||-- name:
, stacktraces, exceptions, etc.
TD
On Tue, Jul 15, 2014 at 10:07 AM, Jerry Lam chiling...@gmail.com
wrote:
Hi Rajesh,
I have a feeling that this is not directly related to spark but I
might be wrong. The reason why is that when you do:
Configuration configuration
Hi Rajesh,
can you describe your spark cluster setup? I saw localhost:2181 for
zookeeper.
Best Regards,
Jerry
On Tue, Jul 15, 2014 at 9:47 AM, Madabhattula Rajesh Kumar
mrajaf...@gmail.com wrote:
Hi Team,
Could you please help me to resolve the issue.
*Issue *: I'm not able to connect
Hi Rajesh,
I have a feeling that this is not directly related to spark but I might be
wrong. The reason why is that when you do:
Configuration configuration = HBaseConfiguration.create();
by default, it reads the configuration files hbase-site.xml in your
classpath and ... (I don't remember
Hi guys,
Sorry, I'm also interested in this nested json structure.
I have a similar SQL in which I need to query a nested field in a json.
Does the above query works if it is used with sql(sqlText) assuming the
data is coming directly from hdfs via sqlContext.jsonFile?
The SPARK-2483
Hi there,
I think the question is interesting; a spark of sparks = spark
I wonder if you can use the spark job server (
https://github.com/ooyala/spark-jobserver)?
So in the spark task that requires a new spark context, instead of creating
it in the task, contact the job server to create one and
Then yarn application -kill appid should work. This is what I did 2 hours ago.
Sorry I cannot provide more help.
Sent from my iPhone
On 14 Jul, 2014, at 6:05 pm, hsy...@gmail.com hsy...@gmail.com wrote:
yarn-cluster
On Mon, Jul 14, 2014 at 2:44 PM, Jerry Lam chiling...@gmail.com wrote
Hi Spark developers,
I have the following hqls that spark will throw exceptions of this kind:
14/07/10 15:07:55 INFO TaskSetManager: Loss was due to
org.apache.spark.TaskKilledException [duplicate 17]
org.apache.spark.SparkException: Job aborted due to stage failure: Task
0.0:736 failed 4 times,
Hi Spark users and developers,
I'm doing some simple benchmarks with my team and we found out a potential
performance issue using Hive via SparkSQL. It is very bothersome. So your
help in understanding why it is terribly slow is very very important.
First, we have some text files in HDFS which
By the way, I also try hql(select * from m).count. It is terribly slow
too.
On Thu, Jul 10, 2014 at 5:08 PM, Jerry Lam chiling...@gmail.com wrote:
Hi Spark users and developers,
I'm doing some simple benchmarks with my team and we found out a potential
performance issue using Hive via
Hi Spark users,
Also, to put the performance issue into perspective, we also ran the query
on Hive. It took about 5 minutes to run.
Best Regards,
Jerry
On Thu, Jul 10, 2014 at 5:10 PM, Jerry Lam chiling...@gmail.com wrote:
By the way, I also try hql(select * from m).count. It is terribly
overhead, then there must be something additional
that SparkSQL adds to the overall overheads that Hive doesn't have.
Best Regards,
Jerry
On Thu, Jul 10, 2014 at 7:11 PM, Michael Armbrust mich...@databricks.com
wrote:
On Thu, Jul 10, 2014 at 2:08 PM, Jerry Lam chiling...@gmail.com wrote
provide the
output of the following command:
println(hql(select s.id from m join s on (s.id=m_id)).queryExecution)
Michael
On Thu, Jul 10, 2014 at 8:15 AM, Jerry Lam chiling...@gmail.com wrote:
Hi Spark developers,
I have the following hqls that spark will throw exceptions of this kind:
14
+1 as well for being able to submit jobs programmatically without using
shell script.
we also experience issues of submitting jobs programmatically without using
spark-submit. In fact, even in the Hadoop World, I rarely used hadoop jar
to submit jobs in shell.
On Wed, Jul 9, 2014 at 9:47 AM,
that defines how my application
should look like. In my humble opinion, using Spark as embeddable library
rather than main framework and runtime is much easier.
On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam chiling...@gmail.com wrote:
+1 as well for being able to submit jobs programmatically without
Hi Konstantin,
I just ran into the same problem. I mitigated the issue by reducing the
number of cores when I executed the job which otherwise it won't be able to
finish.
Unlike many people believes, it might not means that you were running out
of memory. A better answer can be found here:
Hi guys,
I ended up reserving a room at the Phoenix (Hotel:
http://www.jdvhotels.com/hotels/california/san-francisco-hotels/phoenix-hotel)
recommended by my friend who has been in SF.
According to Google, it takes 11min to walk to the conference which is not
too bad.
Hope this helps!
Jerry
Hi Spark users,
Do you guys plan to go the spark summit? Can you recommend any hotel near
the conference? I'm not familiar with the area.
Thanks!
Jerry
Hi Dave,
This is HBase solution to the poor scan performance issue:
https://issues.apache.org/jira/browse/HBASE-8369
I encountered the same issue before.
To the best of my knowledge, this is not a mapreduce issue. It is hbase
issue. If you are planning to swap out mapreduce and replace it with
Hi Shark,
Should I assume that Shark users should not use the shark APIs since there
are no documentations for it? If there are documentations, can you point it
out?
Best Regards,
Jerry
On Thu, Apr 3, 2014 at 9:24 PM, Jerry Lam chiling...@gmail.com wrote:
Hello everyone,
I have
101 - 133 of 133 matches
Mail list logo