You can check out the following library:
https://github.com/alexholmes/json-mapreduce
--
Emre Sevinç
On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot
o.girar...@lateral-thoughts.com wrote:
Hi everyone,
Is there any way in Spark SQL to load multi-line JSON data efficiently, I
think
Exception in thread main java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot
communicate with client version 4
I am not using any hadoop facility (not even hdfs) then why it is giving
this error .
--
Thanks Regards,
Anshu Shukla
On Mon, May 4, 2015 at 9:50 AM, anshu shukla anshushuk...@gmail.com
wrote:
Exception in thread main java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot
communicate with client version 4
I am not using any hadoop facility (not even hdfs) then why it
I took a quick look at that implementation. I'm not sure if it actually
handles JSON correctly, because it attempts to find the first { starting
from a random point. However, that random point could be in the middle of a
string, and thus the first { might just be part of a string, rather than a
Using the inbuilt maven and zinc it takes around 10 minutes for each build.
Is that reasonable?
My maven opts looks like this:
$ echo $MAVEN_OPTS
-Xmx12000m -XX:MaxPermSize=2048m
I'm running it as build/mvn -DskipTests package
Should I be tweaking my Zinc/Nailgun config?
Pramod
On Sun, May 3,
Hello Pramod,
Do you need to build the whole project every time? Generally you don't,
e.g., when I was changing some files that belong only to Spark Streaming, I
was building only the streaming (of course after having build and installed
the whole project, but that was done only once), and then
I think Reynold’s argument shows the impossibility of the general case.
But a “maximum object depth” hint could enable a new input format to do its job
both efficiently and correctly in the common case where the input is an array
of similarly structured objects! I’d certainly be interested
No, I just need to build one project at a time. Right now SparkSql.
Pramod
On Mon, May 4, 2015 at 12:09 AM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello Pramod,
Do you need to build the whole project every time? Generally you don't,
e.g., when I was changing some files that belong only to
Just to give you an example:
When I was trying to make a small change only to the Streaming component of
Spark, first I built and installed the whole Spark project (this took about
15 minutes on my 4-core, 4 GB RAM laptop). Then, after having changed files
only in Streaming, I ran something like
I was wondering if it's possible to use existing Hive SerDes for this ?
Le lun. 4 mai 2015 à 08:36, Joe Halliwell joe.halliw...@gmail.com a
écrit :
I think Reynold’s argument shows the impossibility of the general case.
But a “maximum object depth” hint could enable a new input format to do
*
*
** ** ** ** ** ** Hi,
Is it really necessary to run **mvn --projects assembly/ -DskipTests
install ? Could you please explain why this is needed?
I got the changes after running mvn --projects streaming/ -DskipTests
package.
Regards,
Meethu
On Monday 04 May 2015 02:20 PM,
I'd like to update the information about using Eclipse to develop on the
Spark project found on this page:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38572224
I don't see any way to edit this page (I created an account). Since it's a
wiki, I assumed it's supposed to be
I think it's only committers that can edit it. I suppose you can open
a JIRA with a suggested text change if it is significant enough to
need discussion. If it's trivial, just post it here and someone can
take care of it.
On Mon, May 4, 2015 at 2:32 PM, Iulian Dragoș
iulian.dra...@typesafe.com
Hi Devs,
Just an announcement that I've cut Spark's branch 1.4 to form the
basis of the 1.4 release. Other than a few stragglers, this represents
the end of active feature development for Spark 1.4. Per usual, if
committers are merging any features, please be in touch so I can help
coordinate.
There is an LDA example in the MLlib examples. You can run it like this:
./bin/run-example mllib.LDAExample --stopwordFile stopwords input documents
stop words is a file of stop words, 1 on each line. Input documents are the
text of each document, 1 document per line. To see all the options
Ok, here’s how it should be:
-
Eclipse Luna
-
Scala IDE 4.0
-
Scala Test
The easiest way is to download the Scala IDE bundle from the Scala IDE
download page http://scala-ide.org/download/sdk.html. It comes
pre-installed with ScalaTest. Alternatively, use the provided
It's not JSON, per se, but data formats like smile (
http://en.wikipedia.org/wiki/Smile_%28data_interchange_format%29) provide
support for markers that can't be confused with content and also provide
reasonably similar ergonomics.
—
p...@mult.ifario.us | Multifarious, Inc. |
...and now the workers all have java6 installed.
https://issues.apache.org/jira/browse/SPARK-1437
sadly, the built-in jenkins jdk management doesn't allow us to choose a JDK
version within matrix projects... so we need to manage this stuff
manually.
On Sun, May 3, 2015 at 8:57 AM, shane knapp
If we just set JAVA_HOME in dev/run-test-jenkins, I think it should work.
On Mon, May 4, 2015 at 7:20 PM, shane knapp skn...@berkeley.edu wrote:
...and now the workers all have java6 installed.
https://issues.apache.org/jira/browse/SPARK-1437
sadly, the built-in jenkins jdk management
sgtm
On Mon, May 4, 2015 at 11:23 AM, Patrick Wendell pwend...@gmail.com wrote:
If we just set JAVA_HOME in dev/run-test-jenkins, I think it should work.
On Mon, May 4, 2015 at 7:20 PM, shane knapp skn...@berkeley.edu wrote:
...and now the workers all have java6 installed.
Hi,
I have been investigating scheduling delays in Spark and I found some
unexplained anomalies. In my use case, I have two stages after
collapsing the transformations: the first is a mapPartitions() and the
second is a sortByKey(). I found that the task serialization for the
first stage takes
Joe - I think that's a legit and useful thing to do. Do you want to give it
a shot?
On Mon, May 4, 2015 at 12:36 AM, Joe Halliwell joe.halliw...@gmail.com
wrote:
I think Reynold’s argument shows the impossibility of the general case.
But a “maximum object depth” hint could enable a new input
I don't know whether this is common, but we might also allow another separator
for JSON objects, such as two blank lines.
Matei
On May 4, 2015, at 2:28 PM, Reynold Xin r...@databricks.com wrote:
Joe - I think that's a legit and useful thing to do. Do you want to give it
a shot?
On Mon,
Hi, I am training a GMM with 10 gaussians on a 4 GB dataset(720,000 * 760).
The spark (1.3.1) job is allocated 120 executors with 6GB each and the
driver also has 6GB.
Spark Config Params:
.set(spark.hadoop.validateOutputSpecs,
false).set(spark.dynamicAllocation.enabled,
In addition to Michael suggestion, in my SBT workflow I also use ~ to
automatically kickoff build and unit test. For example,
sbt/sbt ~streaming/test-only *BasicOperationsSuite*
It will automatically detect any file changes in the project and start of
the compilation and testing.
So my full
Hey All,
Community testing during the QA window is an important part of the
release cycle in Spark. It helps us deliver higher quality releases by
vetting out issues not covered by our unit tests.
I was thinking that from now on, it would be nice to recognize the
organizations that donate time
@joe, I'd be glad to help if you need.
Le lun. 4 mai 2015 à 20:06, Matei Zaharia matei.zaha...@gmail.com a
écrit :
I don't know whether this is common, but we might also allow another
separator for JSON objects, such as two blank lines.
Matei
On May 4, 2015, at 2:28 PM, Reynold Xin
FWIW... My Spark SQL development workflow is usually to run build/sbt
sparkShell or build/sbt 'sql/test-only testSuiteName'. These commands
starts in as little as 30s on my laptop, automatically figure out which
subprojects need to be rebuilt, and don't require the expensive assembly
creation.
After talking with people on this thread and offline, I've decided to go
with option 1, i.e. putting everything in a single functions object.
On Thu, Apr 30, 2015 at 10:04 AM, Ted Yu yuzhih...@gmail.com wrote:
IMHO I would go with choice #1
Cheers
On Wed, Apr 29, 2015 at 10:03 PM, Reynold
29 matches
Mail list logo