Hello,
Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979
?
Does others see the same issues?
Thanks
Gil.
We recently released an object store connector for Spark.
https://github.com/SparkTC/stocator
Currently this connector contains driver for the Swift based object store
( like SoftLayer or any other Swift cluster ), but it can easily support
additional object stores.
There is a pending patch to
Hi,
I have the following case, which i am not sure how to resolve.
My code uses HadoopRDD and creates various RDDs on top of it
(MapPartitionsRDD, and so on )
After all RDDs were lazily created, my code "knows" some new information
and i want that "compute" method of the HadoopRDD will be
mohitja...@gmail.com
To: Gil Vernik/Haifa/IBM@IBMIL
Cc: Dev dev@spark.apache.org
Date: 19/08/2015 21:47
Subject:Re: [spark-csv] how to build with Hadoop 2.6.0?
spark-csv should not depend on hadoop
On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik g...@il.ibm.com wrote:
I would like
I would like to build spark-csv with Hadoop 2.6.0
I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
How to define 2.6.0 during spark-csv build? By the way, is it possible to
build spark-csv using
sparkContext.hadoopFile than FileInputFormat will provide all the
partitions and splits, but if i will access the same bucket from some code
that relies on HadoopFSRelation than partitions will be created by
HadoopFSRelation?
Thanks
Gil.
From: Cheng Lian lian.cs@gmail.com
To: Gil Vernik/Haifa/IBM
Just some thoughts, hope i didn't missed something obvious.
HadoopFSRelation calls directly FileSystem class to list files in the
path.
It looks like it implements basically the same logic as in the
FileInputFormat.listStatus method ( located in
hadoop-map-reduce-client-core)
The point is
dependence of it.
From: Ted Yu yuzhih...@gmail.com
To: Josh Rosen joshro...@databricks.com
Cc: Steve Loughran ste...@hortonworks.com, Gil
Vernik/Haifa/IBM@IBMIL, Dev dev@spark.apache.org
Date: 15/07/2015 18:28
Subject:Re: problems with build of latest the master
If I
.
From: Sean Owen so...@cloudera.com
To: Gil Vernik/Haifa/IBM@IBMIL
Cc: Ted Yu yuzhih...@gmail.com, Dev dev@spark.apache.org, Josh
Rosen joshro...@databricks.com, Steve Loughran ste...@hortonworks.com
Date: 15/07/2015 21:41
Subject:Re: problems with build of latest the master
I just did checkout of the master and tried to build it with
mvn -Dhadoop.version=2.6.0 -DskipTests clean package
Got:
[ERROR]
/Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java:117:
error: cannot find symbol
[ERROR]
?
For example, if i create DataFrame from HadoopRDD - does it means that
DataFrame has the same partitions as HadoopRDD?
Thanks
Gil.
From: Gil Vernik/Haifa/IBM@IBMIL
To: Dev dev@spark.apache.org
Date: 12/07/2015 13:06
Subject:question related partitions of the DataFrame
Hi,
DataFrame
for Hadoop version 2.6.0, but perhaps latest Hadoop
versions has the same mockito versions as Spark uses.
Gil Vernik.
From: Gil Vernik/Haifa/IBM@IBMIL
To: Dev dev@spark.apache.org
Date: 14/07/2015 12:23
Subject:problems with build of latest the master
I just did checkout
Hi,
DataFrame extends RDDApi, that provides RDD like methods.
My question is, does DataFrame is sort of stand alone RDD with it?s own
partitions or it depends on the underlying RDD that was used to load the
data into its partitions? It's written that DataFrame has ability to scale
from
Hi All,
I wanted to experiment a little bit with TableScan and PrunedScan.
My first test was to print columns from various SQL queries.
To make this test easier, i just took spark-csv and i replaced TableScan
with PrunedScan.
I then changed buildScan method of CsvRelation from
def BuildScan
To: Gil Vernik/Haifa/IBM@IBMIL
Cc: dev dev@spark.apache.org
Date: 15/04/2015 06:20 PM
Subject:Re: saveAsTextFile and tmp files generations in tasks
The temp file creation is controlled by a hadoop OutputCommitter, which is
normally FileOutputCommitter by default. Its used
be
created in memory?
And the last one, where is the code that responsible for this?
Thanks a lot,
Gil Vernik.
Hi,
I am trying to better understand the code for Parquet support.
In particular i got lost trying to understand ParquetRelation and
ParquetRelation2. Does ParquetRelation2 is the new code that should
completely remove ParquetRelation? ( I think there is some remark in the
code notifying this
to be accessed via file://
?
I will be glad to dig into this in case it's a bug, but would like to know
if this is something intentionally in Spark 1.3.0
( I do can access swift:// names pace from SparkContext, only sqlContext
has this issue )
Thanks,
Gil Vernik.
scala val parquetFile
I just noticed about this one
https://issues.apache.org/jira/browse/SPARK-6351
https://github.com/apache/spark/pull/5039
I verified it and this resolves my issues with Parquet and swift:// name
space.
From: Gil Vernik/Haifa/IBM@IBMIL
To: dev dev@spark.apache.org
Date: 16/03/2015
for
download were built with jackson 1.8.8 which makes them impossible to use
with Hadoop 2.6.0 jars
Thanks
Gil Vernik.
From: Sean Owen so...@cloudera.com
To: Ted Yu yuzhih...@gmail.com
Cc: Gil Vernik/Haifa/IBM@IBMIL, dev dev@spark.apache.org
Date: 18/01/2015 08:23 PM
Subject
particular need in Spark for jackson 1.8.8 and not 1.9.13?
Can we remove 1.8.8 and put 1.9.13 for Avro?
It looks to me that all works fine when Spark build with jackson 1.9.13,
but i am not an expert and not sure what should be tested.
Thanks,
Gil Vernik.
greatly for the exposure of Spark.
The integration between Spark and Swift is very similar to how Spark
integrates with S3.
Will be great to hear comments / suggestions / remarks from the community!
All the best,
Gil Vernik.
branches?
Thanking you in advance,
Gil Vernik.
23 matches
Mail list logo