Hi,
I am contemplating the use of Hadoop with Java 8 in a production system. I
will be using Apache Spark for doing most of the computations on data stored
in HBase.
Although Hadoop seems to support JDK 8 with some tweaks, the official HBase
site states the following for version 0.98,
Running
Ah, thanks.
On Tue, Aug 26, 2014 at 7:32 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
Hi, Victor,
the issue for you to have different version in driver and cluster is that
you the master will shutdown your application due to the inconsistent
SerialVersionID in ExecutorState
Best,
--
Nan
The framework have those info to manage cluster status, and these info (e.g.
worker number) is also available through spark metrics system.
While from the user application's point of view, can you give an example why
you need these info, what would you plan to do with them?
Best Regards,
Maybe this would interest you:
CPU and GPU-accelerated Machine Learning Library:
https://github.com/BIDData/BIDMach
2014-08-27 4:08 GMT+02:00 Matei Zaharia matei.zaha...@gmail.com:
You should try to find a Java-based library, then you can call it from
Scala.
Matei
On August 26, 2014 at
Hi,
*Is there a way to insert data into existing parquet file using spark ?*
I am using spark stream and spark sql to store store real time data into
parquet files and then query it using impala.
spark creating multiple sub directories of parquet files and it make me
challenge while loading it
Like Mayur said, its better to use mapPartition instead of map.
Here's a piece of code which typically reads a text file and inserts each
raw into the database. I haven't tested it, It might throw up some
Serialization errors, In that case, you gotta serialize them!
JavaRDDString
Hey guys, so the problem i'm trying to tackle is the following:
- I need a data source that emits messages at a certain frequency
- There are N neural nets that need to process each message individually
- The outputs from all neural nets are aggregated and only when all N
outputs for each message
Hi all
When I run a simple SQL, encountered the following error.
hive:0.12(metastore in mysql)
hadoop 2.4.1
spark 1.0.2 build with hive
my hql code
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql._
import org.apache.spark.sql.hive.LocalHiveContext
object
Thank you for your answers, and sorry for my lack of understanding.
So I tried what you suggested, with/without unpersisting and with .cache()
(also persist(StorageLevel.MEMORY_AND_DISK) but this is not allowed for msg
because you can't change the Storage level apparently) for msg, g and
newVerts,
Hello all,
I am able to use Spark in the shell but I am not able to run a spark file. I am
using sbt and the jar is created but even the SimpleApp class example given on
the site http://spark.apache.org/docs/latest/quick-start.html is not running. I
installed a prebuilt version of spark and
Hi
I have a three node spark cluster. I restricted the resources per
application by setting appropriate parameters and I could run two
applications simultaneously. Now, I want to replicate an RDD and run two
applications simultaneously. Can someone help how to go about doing this!!!
I replicated
The statement java.io.IOException: Could not locate executable
null\bin\winutils.exe
explains that the null is received when expanding or replacing an
Environment Variable.
I'm guessing that you are missing *HADOOP_HOME* in the environment
variables.
Thanks
Best Regards
On Wed, Aug 27, 2014
Thank you for the reply Matei,
Is there something which we missed. ? I am able to run the spark instance on my
local system i.e. Windows 7 but the same set of steps do not allow me to run it
on Windows server 2012 machine. The black screen just appears for a fraction
of second and disappear,
What should I put the value of that environment variable? I want to run the
scripts locally on my machine and do not have any Hadoop installed.
Thank you
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Mittwoch, 27. August 2014 12:54
To: Hingorani, Vineet
Cc: user@spark.apache.org
Dear all,
I'm looking for an efficient way to manage external dependencies. I know
that one can add .jar or .py dependencies easily but how can I handle other
type of dependencies. Specifically, I have some data processing algorithm
implemented with other languages (ruby, octave, matlab, c++) and
It should point to your hadoop installation directory. (like C:\hadoop\)
Since you don't have hadoop installed, What is the code that you are
running?
Thanks
Best Regards
On Wed, Aug 27, 2014 at 4:50 PM, Hingorani, Vineet vineet.hingor...@sap.com
wrote:
What should I put the value of that
Thank you Matei.
I found a solution using pipe and matlab engine (an executable that can
call matlab behind the scene and uses stdin and stdout to communicate). I
just need to fix two other issues :
- how can I handle my dependencies ? My matlab script need other matlab
files that need to be
The code is the example given on Spark site:
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile =
Hi All,
I am using spark-1.0.0 to parse a json file and save to values to cassandra
using case class.
My code looks as follows:
case class LogLine(x1:Option[String],x2:
Option[String],x3:Option[List[String]],x4:
Hello all,
I am able to use Spark in the shell but I am not able to run a spark file. I am
using sbt and the jar is created but even the SimpleApp class example given on
the site http://spark.apache.org/docs/latest/quick-start.html is not running. I
installed a prebuilt version of spark and
(apologies for sending this twice, first via nabble; didn't realize it
wouldn't get forwarded)
Hey, I know it's not officially released yet, but I'm trying to understand
(and run) the Thrift-based JDBC server, in order to enable remote JDBC
access to our dev cluster.
Before asking about details,
I got it upright Matei,
Thank you. I was giving wrong directory path. Thank you...!!
Thanks,
Abhishek Mishra
-Original Message-
From: Mishra, Abhishek [mailto:abhishek.mis...@xerox.com]
Sent: Wednesday, August 27, 2014 4:38 PM
To: Matei Zaharia
Cc: user@spark.apache.org
Subject: RE:
You can install hadoop 2 by reading this doc
https://wiki.apache.org/hadoop/Hadoop2OnWindows Once you are done with it,
you can set the environment variable HADOOP_HOME then it should work.
Also Not sure if it will work, but can you provide file:// at the front and
give it a go? I don't see any
It didn’t work after adding file:// in the front. I compiled it again and ran
it. The same error are coming. Do you think there can be some problem with the
java dependency? Also, I don’t want to install Hadoop I just want to run it on
local machine. The reason is, whenever I install these
What Spark and Hadoop versions are you on? I have it working in my Spark
app with the parquet-hive-bundle-1.5.0.jar bundled into my app fat-jar.
I¹m running Spark 1.0.2 and CDH5.
bin/spark-shell --master local[*] --driver-class-path
~/parquet-hive-bundle-1.5.0.jar
To see if that works?
On
Hello everyone,
Is it possible to use an external data structure, such as Saddle, in
Spark? As far as I know, a RDD is a kind of wrapper or container that has
certain data structure inside. So I was wondering whether this data
structure has to be either a basic (or native) structure or any
Thank you Akhil and Mayur.
It will be really helpful.
Thanks,
On 27 Aug 2014 13:19, Akhil Das ak...@sigmoidanalytics.com wrote:
Like Mayur said, its better to use mapPartition instead of map.
Here's a piece of code which typically reads a text file and inserts each
raw into the database. I
All,
Does anyone have specific references to customers, use cases and large-scale
deployments of Spark Streaming? By OElarge scale¹ I mean both through-put and
number of nodes. I¹m attempting an objective comparison of Streaming and
Storm and while this data is known for Storm, there appears to
As suggested in the error messages, double-check your class path.
From: CharlieLin chury...@gmail.commailto:chury...@gmail.com
Date: Tuesday, August 26, 2014 at 8:29 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Execute
I have long-lived state I'd like to maintain on the executors that I'd like
to initialize during some bootstrap phase and to update the master when
such executor leaves the cluster.
On Tue, Aug 26, 2014 at 11:18 PM, Liu, Raymond raymond@intel.com
wrote:
The framework have those info to
I have similar requirement to export the data to mysql. Just wanted to know
what the best approach is so far after the research you guys have done.
Currently thinking of saving to hdfs and use sqoop to handle export. Is that
the best approach or is there any other way to write to mysql? Thanks!
Thank you all. Actually I was looking at JCUDA. Function wise this may be
a perfect solution to offload computation to GPU. Will see how performance
it will be, especially with the Java binding.
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J.
I have the same issue (I'm using the latest 1.1.0-SNAPSHOT).
I've increased my driver memory to 30G, executor memory to 10G,
and spark.akka.askTimeout to 180. Still no good. My other configurations
are:
spark.serializer
org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.mb
I solved this issue by putting hbase-protobuf in Hadoop classpath, and not
in the spark classpath.
export HADOOP_CLASSPATH=/path/to/jar/hbase-protocol-0.98.1-cdh5.1.0.jar
On Tue, Aug 26, 2014 at 5:42 PM, Ashish Jain ashish@gmail.com wrote:
Hello,
I'm using the following version of
It looks like the issue I had is that I didn't pull in htrace-core jar into
the spark class path.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-Connecting-to-HBase-in-spark-shell-tp12855p12924.html
Sent from the Apache Spark User List mailing list
Hi Wei,
Please keep us posted about the performance result you get. This would
be very helpful.
Best,
Xiangrui
On Wed, Aug 27, 2014 at 10:33 AM, Wei Tan w...@us.ibm.com wrote:
Thank you all. Actually I was looking at JCUDA. Function wise this may be a
perfect solution to offload computation
It is my mistake, some how I have added the io.compression.codec property value
as the above mentioned class. Resolved the problem now
Thanks and Regards,
Sankar S.
On Wednesday, 27 August 2014, 1:23, S Malligarjunan smalligarju...@yahoo.com
wrote:
Hello all,
I have just checked out
Hi Dibyendu,
That would be great. One of the biggest drawback of Kafka utils as well as
your implementation is I am unable to scale out processing. I am
relatively new to Spark and Spark Streaming - from what I read and what I
observe with my deployment is that having the RDD created on one
Hi All,
I am planning to run amplab benchmark suite to evaluate the performance of our
cluster. I looked at: https://amplab.cs.berkeley.edu/benchmark/ and it mentions
about data avallability at:
s3n://big-data-benchmark/pavlo/[text|text-deflate|sequence|sequence-snappy]/[suffix]where
/tiny/,
Hi Sameer,
I've faced this issue before. They don't show up on
http://s3.amazonaws.com/big-data-benchmark/. But you can directly use:
`sc.textFile(s3n://big-data-benchmark/pavlo/text/tiny/crawl)`
The gotcha is that you also need to supply which dataset you want: crawl,
uservisits, or rankings
Thanks a lot. Finally, I can create parquet table using your command
-driver-class-path.
I am using hadoop 2.3. Now, I will try to load data into the tables.
Thanks,
lyc
--
View this message in context:
Hi Burak,Thanks, I will then start benchmarking the cluster.
Date: Wed, 27 Aug 2014 11:52:05 -0700
From: bya...@stanford.edu
To: ssti...@live.com
CC: user@spark.apache.org
Subject: Re: Amplab: big-data-benchmark
Hi Sameer,
I've faced this issue before. They don't show up on
I'll note the parquet jars are included by default in 1.1
On Wed, Aug 27, 2014 at 11:53 AM, lyc yanchen@huawei.com wrote:
Thanks a lot. Finally, I can create parquet table using your command
-driver-class-path.
I am using hadoop 2.3. Now, I will try to load data into the tables.
I would expect that to work. What exactly is the error?
On Wed, Aug 27, 2014 at 6:02 AM, Matt Chu m...@kabam.com wrote:
(apologies for sending this twice, first via nabble; didn't realize it
wouldn't get forwarded)
Hey, I know it's not officially released yet, but I'm trying to understand
Can you tell which nodes were doing the computation in each case?
Date: Wed, 27 Aug 2014 20:29:38 +0530
Subject: Execution time increasing with increase of cluster size
From: sarathchandra.jos...@algofusiontech.com
To: user@spark.apache.org
Hi,
I've written a simple scala program which reads a
You need to have the datanuclus jars on your classpath. It is not okay to
merge them into an uber jar.
On Wed, Aug 27, 2014 at 1:44 AM, centerqi hu cente...@gmail.com wrote:
Hi all
When I run a simple SQL, encountered the following error.
hive:0.12(metastore in mysql)
hadoop 2.4.1
Looking for fellow Spark enthusiasts based in and around Research Triangle
Park, Raleigh, Durham, and Chapel Hill, North Carolina
Please get in touch off list for an employment opportunity. Must be local.
Thanks!
-Andrew
-
Hi All,I was wondering can someone please tell me the status of MLbase and its
roadmap in terms of software release. We are very interested in exploring it
for our applications.
Hi, I tried a similar question before and didn't get any answers,so I'll
try again:
I am using updateStateByKey, pretty much exactly as shown in the examples
shipping with Spark:
def createContext(master:String,dropDir:String, checkpointDirectory:String) = {
val updateFunc = (values:
You just have to tell Spark which log4j properties file to use. I think
--driver-java-options=-Dlog4j.configuration=log4j.properties should work
but it didn't for me. set
SPARK_JAVA_OPTS=-Dlog4j.configuration=log4j.properties did work though (this
was on Windows, in local mode, assuming you put a
Hi,
In an attempt to keep processing logic as simple as possible, I'm trying to
use spark streaming for processing historic as well as real-time data.
This works quite well, using big intervals that match the window size for
historic data, and small intervals for real-time.
I found this
you could try looking at ScalaCL[1], it's targeting OpenCL rather than
CUDA, but that might be close enough?
cheers, Frank
1. https://github.com/ochafik/ScalaCL
On Wed, Aug 27, 2014 at 7:33 PM, Wei Tan w...@us.ibm.com wrote:
Thank you all. Actually I was looking at JCUDA. Function wise this
i feel like SchemaRDD has usage beyond just sql. perhaps it belongs in core?
Hello,
If I do:
DStream transform {
rdd.zipWithIndex.map {
Is the index guaranteed to be unique across all RDDs here?
}
}
Thanks,
-Soumitra.
No. The indices start at 0 for every RDD. -Xiangrui
On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar
kumar.soumi...@gmail.com wrote:
Hello,
If I do:
DStream transform {
rdd.zipWithIndex.map {
Is the index guaranteed to be unique across all RDDs here?
}
}
Thanks,
Hi,
I'm running on the master branch and I noticed that textFile ignores
minPartition for bz2 files. Is anyone else seeing the same thing? I tried
varying minPartitions for a bz2 file and rdd.partitions.size was always 1
whereas doing it for a non-bz2 file worked.
Not sure if this matters or not
So, I guess zipWithUniqueId will be similar.
Is there a way to get unique index?
On Wed, Aug 27, 2014 at 2:39 PM, Xiangrui Meng men...@gmail.com wrote:
No. The indices start at 0 for every RDD. -Xiangrui
On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar
kumar.soumi...@gmail.com wrote:
You can use RDD id as the seed, which is unique in the same spark
context. Suppose none of the RDDs would contain more than 1 billion
records. Then you can use
rdd.zipWithUniqueId().mapValues(uid = rdd.id * 1e9.toLong + uid)
Just a hack ..
On Wed, Aug 27, 2014 at 2:59 PM, Soumitra Kumar
Are you using hadoop-1.0? Hadoop doesn't support splittable bz2 files
before 1.2 (or a later version). But due to a bug
(https://issues.apache.org/jira/browse/HADOOP-10614), you should try
hadoop-2.5.0. -Xiangrui
On Wed, Aug 27, 2014 at 2:49 PM, jerryye jerr...@gmail.com wrote:
Hi,
I'm running
I think this will increasingly be its role, though it doesn't make sense to use
it to core because it is clearly just a client of the core APIs. What usage do
you have in mind in particular? It would be nice to know how the non-SQL APIs
for this could be better.
Matei
On August 27, 2014 at
Thanks.
Just to double check, rdd.id would be unique for a batch in a DStream?
On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng men...@gmail.com wrote:
You can use RDD id as the seed, which is unique in the same spark
context. Suppose none of the RDDs would contain more than 1 billion
Hey Matt, if you want to access existing Hive data, you still need a to run
a Hive metastore service, and provide a proper hive-site.xml (just drop it
in $SPARK_HOME/conf).
Could you provide the error log you saw?
On Wed, Aug 27, 2014 at 12:09 PM, Michael Armbrust mich...@databricks.com
Yeah - each batch will produce a new RDD.
On Wed, Aug 27, 2014 at 3:33 PM, Soumitra Kumar
kumar.soumi...@gmail.com wrote:
Thanks.
Just to double check, rdd.id would be unique for a batch in a DStream?
On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng men...@gmail.com wrote:
You can use RDD
Hello,
I've been seeing the following errors when trying to save to S3:
Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage fail
ure: Task 4058 in stage 2.1 failed 4 times, most recent failure: Lost task
4058.3 in stag
e 2.1 (TID 12572,
I see a issue here.
If rdd.id is 1000 then rdd.id * 1e9.toLong would be BIG.
I wish there was DStream mapPartitionsWithIndex.
On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng men...@gmail.com wrote:
You can use RDD id as the seed, which is unique in the same spark
context. Suppose none of the
Hi, Michael.
I used HiveContext to create a table with a field of type Array. However, in
the hql results, this field was returned as type ArrayBuffer which is mutable.
Would it make more sense to be an Array?
The Spark version of my test is 1.0.2. I haven’t tested it on SQLContext nor
newer
I'm not so sure that your error is coming from the cassandra write.
you have val data = test.map(..).map(..)
so data will actually not get created until you try to save it. Can you try
to do something like data.count() or data.take(k) after this line and see if
you even get to the cassandra
Arrays in the JVM are also mutable. However, you should not be relying on
the exact type here. The only promise is that you will get back something
of type Seq[_].
On Wed, Aug 27, 2014 at 4:27 PM, Du Li l...@yahoo-inc.com wrote:
Hi, Michael.
I used HiveContext to create a table with a
I found this discrepancy when writing unit tests for my project. Basically the
expectation was that the returned type should match that of the input data.
Although it’s easy to work around, I was just feeling a bit weird. Is there a
better reason to return ArrayBuffer?
From: Michael Armbrust
Hi,
I have Spark (1.0.0 on CDH5) running with Kafka 0.8.1.1.
I have a streaming jobs that reads from a kafka topic and writes
output to another kafka topic. The job starts fine but after a while
the input stream stops getting any data. I think these messages show
no incoming data on the stream:
In general the various language interfaces try to return the natural type
for the language. In python we return lists in scala we return Seqs.
Arrays on the JVM have all sorts of messy semantics (e.g. they are
invariant and don't have erasure).
On Wed, Aug 27, 2014 at 5:34 PM, Du Li
Hi Sameer,
MLbase started out as a set of three ML components on top of Spark. The
lowest level, MLlib, is now a rapidly growing component within Spark and is
maintained by the Spark community. The two higher-level components (MLI and
MLOpt) are experimental components that serve as testbeds for
I agree. This issue should be fixed in Spark rather rely on replay of Kafka
messages.
Dib
On Aug 28, 2014 6:45 AM, RodrigoB rodrigo.boav...@aspect.com wrote:
Dibyendu,
Tnks for getting back.
I believe you are absolutely right. We were under the assumption that the
raw data was being
Hi,
I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase
0.98,
My steps:
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz
tar -vxf spark-1.0.2.tgz
cd spark-1.0.2
edit project/SparkBuild.scala, set HBASE_VERSION
// HBase version; set as appropriate.
val
See SPARK-1297
The pull request is here:
https://github.com/apache/spark/pull/1893
On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
(correction: Compilation Error: Spark 1.0.2 with HBase 0.98” , please
ignore if duplicated)
Hi,
I need to use
Hi Ted,
Thank you so much!!
As I am new to Spark, can you please advise the steps about how to apply this
patch to my spark-1.0.2 source folder?
Regards
Arthur
On 28 Aug, 2014, at 10:13 am, Ted Yu yuzhih...@gmail.com wrote:
See SPARK-1297
The pull request is here:
Update:
I use shell script to execute the spark-shell, inside the my-script.sh:
$SPARK_HOME/bin/spark-shell $HOME/test.scala $HOME/test.log 21
Although it correctly finish the println(hallo world), but the strange thing
is that my-script.sh finished before spark-shell even finish executing
You can get the patch from this URL:
https://github.com/apache/spark/pull/1893.patch
BTW 0.98.5 has been released - you can specify 0.98.5-hadoop2 in the pom.xml
Cheers
On Wed, Aug 27, 2014 at 7:18 PM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
Hi Ted,
Thank you so much!!
Hi Ted,
I tried the following steps to apply the patch 1893 but got Hunk FAILED, can
you please advise how to get thru this error? or is my spark-1.0.2 source not
the correct one?
Regards
Arthur
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz
tar -vxf spark-1.0.2.tgz
cd spark-1.0.2
Can you use this command ?
patch -p1 -i 1893.patch
Cheers
On Wed, Aug 27, 2014 at 7:41 PM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
Hi Ted,
I tried the following steps to apply the patch 1893 but got Hunk FAILED,
can you please advise how to get thru this error? or is my
Hi Ted,
Thanks.
Tried [patch -p1 -i 1893.patch](Hunk #1 FAILED at 45.)
Is this normal?
Regards
Arthur
patch -p1 -i 1893.patch
patching file examples/pom.xml
Hunk #1 FAILED at 45.
Hunk #2 succeeded at 94 (offset -16 lines).
1 out of 2 hunks FAILED -- saving rejects to file
You can use spark-shell -i file.scala to run that. However, that keeps the
interpreter open at the end, so you need to make your file end with
System.exit(0) (or even more robustly, do stuff in a try {} and add that in
finally {}).
In general it would be better to compile apps and run them
Hi,
I use Hadoop 2.4.1, HBase 0.98.5, Zookeeper 3.4.6 and Hive 0.13.1.
I just tried to compile Spark 1.0.2, but got error on Spark Project Hive, can
you please advise which repository has
org.spark-project.hive:hive-metastore:jar:0.13.1?
FYI, below is my repository setting in maven which
Looks like the patch given by that URL only had the last commit.
I have attached pom.xml for spark-1.0.2 to SPARK-1297
You can download it and replace examples/pom.xml with the downloaded pom
I am running this command locally:
mvn -Phbase-hadoop2,hadoop-2.4,yarn -DskipTests clean package
See this thread:
http://search-hadoop.com/m/JW1q5wwgyL1/Working+Formula+for+Hive+0.13subj=Re+Working+Formula+for+Hive+0+13+
On Wed, Aug 27, 2014 at 8:54 PM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
Hi,
I use Hadoop 2.4.1, HBase 0.98.5, Zookeeper 3.4.6 and Hive 0.13.1.
I
Hi Yana
I have done take and confirmed existence of data..Also checked that it is
getting connected to Cassandra.. That is why I suspect that this particular
rdd is not serializable..
Thanks,
Lmk
On Aug 28, 2014 5:13 AM, Yana [via Apache Spark User List]
ml-node+s1001560n12960...@n3.nabble.com
I forgot to include '-Dhadoop.version=2.4.1' in the command below.
The modified command passed.
You can verify the dependence on hbase 0.98 through this command:
mvn -Phbase-hadoop2,hadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests
dependency:tree dep.txt
Cheers
On Wed, Aug 27, 2014 at
Hi,
We have migrated Pig functionality on top of Spark passing 100% e2e for
success cases in pig test suite. That means UDF, Joins other
functionality is working quite nicely. We are in the process of merging
with Apache Pig trunk(something that should happen over the next 2 weeks).
Meanwhile if
Awesome to hear this, Mayur! Thanks for putting this together.
Matei
On August 27, 2014 at 10:04:12 PM, Mayur Rustagi (mayur.rust...@gmail.com)
wrote:
Hi,
We have migrated Pig functionality on top of Spark passing 100% e2e for success
cases in pig test suite. That means UDF, Joins other
Hi,
I have two files..
main_app.py and helper.py
main_app.py calls some functions in helper.py.
I want to use spark-submit to submit a job but how do i specify helper.py?
Basically, how do i specify multiple files in spark?
Thanks
90 matches
Mail list logo