Spark-13979: issues with hadoopConf

2016-07-03 Thread Gil Vernik
Hello, Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979 ? Does others see the same issues? Thanks Gil.

new object store driver for Spark

2016-03-22 Thread Gil Vernik
We recently released an object store connector for Spark. https://github.com/SparkTC/stocator Currently this connector contains driver for the Swift based object store ( like SoftLayer or any other Swift cluster ), but it can easily support additional object stores. There is a pending patch to

how to send additional configuration to the RDD after it was lazily created

2015-09-17 Thread Gil Vernik
Hi, I have the following case, which i am not sure how to resolve. My code uses HadoopRDD and creates various RDDs on top of it (MapPartitionsRDD, and so on ) After all RDDs were lazily created, my code "knows" some new information and i want that "compute" method of the HadoopRDD will be

Re: [spark-csv] how to build with Hadoop 2.6.0?

2015-08-19 Thread Gil Vernik
mohitja...@gmail.com To: Gil Vernik/Haifa/IBM@IBMIL Cc: Dev dev@spark.apache.org Date: 19/08/2015 21:47 Subject:Re: [spark-csv] how to build with Hadoop 2.6.0? spark-csv should not depend on hadoop On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik g...@il.ibm.com wrote: I would like

[spark-csv] how to build with Hadoop 2.6.0?

2015-08-16 Thread Gil Vernik
I would like to build spark-csv with Hadoop 2.6.0 I noticed that when i build it with sbt/sbt ++2.10.4 package it build it with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository). How to define 2.6.0 during spark-csv build? By the way, is it possible to build spark-csv using

Re: possible issues with listing objects in the HadoopFSrelation

2015-08-12 Thread Gil Vernik
sparkContext.hadoopFile than FileInputFormat will provide all the partitions and splits, but if i will access the same bucket from some code that relies on HadoopFSRelation than partitions will be created by HadoopFSRelation? Thanks Gil. From: Cheng Lian lian.cs@gmail.com To: Gil Vernik/Haifa/IBM

possible issues with listing objects in the HadoopFSrelation

2015-08-10 Thread Gil Vernik
Just some thoughts, hope i didn't missed something obvious. HadoopFSRelation calls directly FileSystem class to list files in the path. It looks like it implements basically the same logic as in the FileInputFormat.listStatus method ( located in hadoop-map-reduce-client-core) The point is

Re: problems with build of latest the master

2015-07-15 Thread Gil Vernik
dependence of it. From: Ted Yu yuzhih...@gmail.com To: Josh Rosen joshro...@databricks.com Cc: Steve Loughran ste...@hortonworks.com, Gil Vernik/Haifa/IBM@IBMIL, Dev dev@spark.apache.org Date: 15/07/2015 18:28 Subject:Re: problems with build of latest the master If I

Re: problems with build of latest the master

2015-07-15 Thread Gil Vernik
. From: Sean Owen so...@cloudera.com To: Gil Vernik/Haifa/IBM@IBMIL Cc: Ted Yu yuzhih...@gmail.com, Dev dev@spark.apache.org, Josh Rosen joshro...@databricks.com, Steve Loughran ste...@hortonworks.com Date: 15/07/2015 21:41 Subject:Re: problems with build of latest the master

problems with build of latest the master

2015-07-14 Thread Gil Vernik
I just did checkout of the master and tried to build it with mvn -Dhadoop.version=2.6.0 -DskipTests clean package Got: [ERROR] /Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java:117: error: cannot find symbol [ERROR]

Re: question related partitions of the DataFrame

2015-07-14 Thread Gil Vernik
? For example, if i create DataFrame from HadoopRDD - does it means that DataFrame has the same partitions as HadoopRDD? Thanks Gil. From: Gil Vernik/Haifa/IBM@IBMIL To: Dev dev@spark.apache.org Date: 12/07/2015 13:06 Subject:question related partitions of the DataFrame Hi, DataFrame

Re: problems with build of latest the master

2015-07-14 Thread Gil Vernik
for Hadoop version 2.6.0, but perhaps latest Hadoop versions has the same mockito versions as Spark uses. Gil Vernik. From: Gil Vernik/Haifa/IBM@IBMIL To: Dev dev@spark.apache.org Date: 14/07/2015 12:23 Subject:problems with build of latest the master I just did checkout

question related partitions of the DataFrame

2015-07-12 Thread Gil Vernik
Hi, DataFrame extends RDDApi, that provides RDD like methods. My question is, does DataFrame is sort of stand alone RDD with it?s own partitions or it depends on the underlying RDD that was used to load the data into its partitions? It's written that DataFrame has ability to scale from

TableScan vs PrunedScan

2015-07-07 Thread Gil Vernik
Hi All, I wanted to experiment a little bit with TableScan and PrunedScan. My first test was to print columns from various SQL queries. To make this test easier, i just took spark-csv and i replaced TableScan with PrunedScan. I then changed buildScan method of CsvRelation from def BuildScan

Re: saveAsTextFile and tmp files generations in tasks

2015-04-15 Thread Gil Vernik
To: Gil Vernik/Haifa/IBM@IBMIL Cc: dev dev@spark.apache.org Date: 15/04/2015 06:20 PM Subject:Re: saveAsTextFile and tmp files generations in tasks The temp file creation is controlled by a hadoop OutputCommitter, which is normally FileOutputCommitter by default. Its used

saveAsTextFile and tmp files generations in tasks

2015-04-14 Thread Gil Vernik
be created in memory? And the last one, where is the code that responsible for this? Thanks a lot, Gil Vernik.

parquet support - some questions about code

2015-03-18 Thread Gil Vernik
Hi, I am trying to better understand the code for Parquet support. In particular i got lost trying to understand ParquetRelation and ParquetRelation2. Does ParquetRelation2 is the new code that should completely remove ParquetRelation? ( I think there is some remark in the code notifying this

problems with Parquet in Spark 1.3.0

2015-03-16 Thread Gil Vernik
to be accessed via file:// ? I will be glad to dig into this in case it's a bug, but would like to know if this is something intentionally in Spark 1.3.0 ( I do can access swift:// names pace from SparkContext, only sqlContext has this issue ) Thanks, Gil Vernik. scala val parquetFile

Re: problems with Parquet in Spark 1.3.0

2015-03-16 Thread Gil Vernik
I just noticed about this one https://issues.apache.org/jira/browse/SPARK-6351 https://github.com/apache/spark/pull/5039 I verified it and this resolves my issues with Parquet and swift:// name space. From: Gil Vernik/Haifa/IBM@IBMIL To: dev dev@spark.apache.org Date: 16/03/2015

Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-02-09 Thread Gil Vernik
for download were built with jackson 1.8.8 which makes them impossible to use with Hadoop 2.6.0 jars Thanks Gil Vernik. From: Sean Owen so...@cloudera.com To: Ted Yu yuzhih...@gmail.com Cc: Gil Vernik/Haifa/IBM@IBMIL, dev dev@spark.apache.org Date: 18/01/2015 08:23 PM Subject

run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-01-17 Thread Gil Vernik
particular need in Spark for jackson 1.8.8 and not 1.9.13? Can we remove 1.8.8 and put 1.9.13 for Avro? It looks to me that all works fine when Spark build with jackson 1.9.13, but i am not an expert and not sure what should be tested. Thanks, Gil Vernik.

Apache Spark and Swift object store

2014-06-08 Thread Gil Vernik
greatly for the exposure of Spark. The integration between Spark and Swift is very similar to how Spark integrates with S3. Will be great to hear comments / suggestions / remarks from the community! All the best, Gil Vernik.

queston about Spark repositories in GitHub

2014-05-19 Thread Gil Vernik
branches? Thanking you in advance, Gil Vernik.