The spark-training scripts are not presently working 100%: the errors
displayed when starting the slaves are shown below.
Possibly a newer location for the files exists (I pulled from
https://github.com/amplab/training-scripts an it is nearly 6 months old)
cp: cannot create regular file
: link_stat /root/mesos-ec2 failed: No such file or directory (2)
But in this latest version the mesos errors appear not to be fatal: the
cluster is in the process of coming up (copying wikipedia data now..)
.
2014-03-08 6:26 GMT-08:00 Stephen Boesch java...@gmail.com:
The spark-training scripts
Hi Folks
I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop
Server design but there does not seem to be much Spark related collateral
around infrastructure guidelines (or at least I haven't been able to find
them). My current thinking for server design is something
We have a spark server already running. When invoking spark-shell a new
http server is attempted to be started
spark.HttpServer: Starting HTTP Server
But that attempts results in a BindException due to the preexisting
server:
java.net.BindException: Address already in use
What is the
of spark-submit.
Thanks
On May 5, 2014, at 10:24 PM, Stephen Boesch java...@gmail.com wrote:
I have a spark streaming application that uses the external streaming
modules (e.g. kafka, mqtt, ..) as well. It is not clear how to properly
invoke the spark-submit script: what
@Sonal - makes sense. Is the maven shade plugin runnable within sbt ? If
so would you care to share those build.sbt (or .scala) lines? If not, are
you aware of a similar plugin for sbt?
2014-05-11 23:53 GMT-07:00 Sonal Goyal sonalgoy...@gmail.com:
Hi Stephen,
I am using maven shade
It seems the concept I had been missing is to invoke the DStream foreach
method. This method takes a function expecting an RDD and applies the
function to each RDD within the DStream.
2014-05-14 21:33 GMT-07:00 Stephen Boesch java...@gmail.com:
Looking further it appears the functionality I
) = Unit*
) extends DStream[Unit](parent.ssc) {
I would like to have access to this structure - particularly the ability to
define an foreachFunc that gets applied to each RDD within the DStream.
Is there a means to do so?
2014-05-14 21:25 GMT-07:00 Stephen Boesch java...@gmail.com:
Given
Hi Marco,
Hive itself is not working in the CDH5.0 VM (due to FNFE's on the third
party jars). While you did not mention using Shark, you may keep that in
mind. I will try out spark-only commands late today and report what I find.
2014-05-14 5:00 GMT-07:00 Marco Shaw marco.s...@gmail.com:
There is a bin/run-example.sh example-class [args]
2014-05-22 12:48 GMT-07:00 yxzhao yxz...@ualr.edu:
I want to run the LR, SVM, and NaiveBayes algorithms implemented in the
following directory on my data set. But I did not find the sample command
line to run them. Anybody help? Thanks.
We are using a back version of spark (0.8.1) that depends on a customized
version of kafka 0.7.2-spark. Where are the sources for it - either
svn/github or simply the sources..jar
For reference here is the maven repo location for the binaries:
The MergeStrategy combined with sbt assembly did work for me. This is not
painless: some trial and error and the assembly may take multiple minutes.
You will likely want to filter out some additional classes from the
generated jar file. Here is an SOF answer to explain that and with IMHO
the
The present trunk is built and tested against HBase 0.94.
I have tried various combinations of versions of HBase 0.96+ and Spark 1.0+
and all end up with
14/06/27 20:11:15 INFO HttpServer: Starting HTTP Server
[error] (run-main-0) java.lang.SecurityException: class
GMT-07:00 Sean Owen so...@cloudera.com:
This sounds like an instance of roughly the same item as in
https://issues.apache.org/jira/browse/SPARK-1949 Have a look at
adding that exclude to see if it works.
On Fri, Jun 27, 2014 at 10:21 PM, Stephen Boesch java...@gmail.com
wrote:
The present
Hi Jerry,
To add to your question:
Following does work (from master)- notice the registerAsTable is commented
: (I took a liberty to add the order by clause)
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
import hiveContext._
hql(USE test)
// hql(select id from
Hi Sean
RE: Windows and hadoop 2.4.x
HortonWorks - all the hype aside - only supports Windows Server 2008/2012.
So this general concept of supporting Windows is bunk.
Given that - and since the vast majority of Windows users do not happen to
have Windows Server on their laptop - do you have any
The javaDoc seems reasonably helpful:
/**
* A lower-level, unstable API intended for developers.
*
* Developer API's might change or be removed in minor versions of Spark.
*
*/
These would be contrasted with non-Developer (more or less
production?) API's that are deemed to be stable within a
I am noticing disparities in behavior between the REPL and in my standalone
program in terms of implicit conversion of an RDD to SchemaRDD.
In the REPL the following sequence works:
import sqlContext._
val mySchemaRDD = myNormalRDD.where(1=1)
However when attempting similar in a standalone
at SchemaRDD.scala:102
== Query Plan ==
== Physical Plan ==
Filter 1=1
ExistingRdd [col1#8,col2#9], MapPartitionsRDD[27] at mapPartitions at
basicOperators.scala:219
So .. what is the magic formula for setting up the imports for the
SchemaRDD imports to work properly?
2014-10-02 2:00 GMT-07:00 Stephen
Consider there is some connection / external resource allocation required
to be accessed/mutated by each of the rows from within a single worker
thread. That connection should only be opened/closed before the first row
is accessed / after the last row is completed.
It is my understanding that
The build instructions for pyspark appear to be:
sbt/sbt assembly
Given that maven is the preferred build tool since July 1, presumably I
have overlooked the instructions for building via maven? Anyone please
point it out? thanks
/classes to the
python module search path.
2014-10-08 14:01 GMT-07:00 Stephen Boesch java...@gmail.com:
The build instructions for pyspark appear to be:
sbt/sbt assembly
Given that maven is the preferred build tool since July 1, presumably I
have overlooked the instructions for building via
is the following what you are looking for?
scala sc.parallelize(myMap.map{ case (k,v) = (k,v) }.toSeq)
res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at
parallelize at console:21
2014-10-13 14:02 GMT-07:00 jon.g.massey jon.g.mas...@gmail.com:
Hi guys,
Just
After having checked out from master/head the following error occurs when
attempting to run any test in Intellij
Exception in thread main java.lang.NoClassDefFoundError:
com/google/common/util/concurrent/ThreadFactoryBuilder
at org.apache.spark.util.Utils$.init(Utils.scala:648)
There appears to
Yes it is necessary to do a mvn clean when encountering this issue.
Typically you would have changed one or more of the profiles/options -
which leads to this occurring.
2014-10-22 22:00 GMT-07:00 Ryan Williams ryan.blake.willi...@gmail.com:
I started building Spark / running Spark tests this
; if anyone
can confirm whether they've seen it on Linux, that would be good to know.
Stephen: good to know re: profiles/options. I don't think changing them is
a necessary condition as I believe I've run into it without doing that, but
any set of steps to reproduce this would be welcome so that we
I seem to recall there were some specific requirements on how to import the
implicits.
Here is the issue:
scala import org.apache.spark.mllib.rdd.RDDFunctions._
console:10: error: object RDDFunctions in package rdd cannot be accessed
in package org.apache.spark.mllib.rdd
import
be called by function in mllib.
2014-10-28 17:09 GMT+08:00 Stephen Boesch java...@gmail.com:
I seem to recall there were some specific requirements on how to import
the implicits.
Here is the issue:
scala import org.apache.spark.mllib.rdd.RDDFunctions._
console:10: error: object RDDFunctions
at 2:13 PM, Stephen Boesch java...@gmail.com wrote:
After having checked out from master/head the following error occurs when
attempting to run any test in Intellij
Exception in thread main java.lang.NoClassDefFoundError:
com/google/common/util/concurrent/ThreadFactoryBuilder
I have checked out from master, cleaned/rebuilt on command line in maven,
then cleaned/rebuilt in intellij many times. This error persists through it
all. Anyone have a solution?
2014-10-23 1:43 GMT-07:00 Stephen Boesch java...@gmail.com:
After having checked out from master/head
As a template for creating a broadcast variable, the following code snippet
within mllib was used:
val bcIdf = dataset.context.broadcast(idf)
dataset.mapPartitions { iter =
val thisIdf = bcIdf.value
The new code follows that model:
import org.apache.spark.mllib.linalg.{Vector =
= sc.broadcast(crows)
..
val arrayVect = bcRows.value
2014-10-30 7:42 GMT-07:00 Stephen Boesch java...@gmail.com:
As a template for creating a broadcast variable, the following code
snippet within mllib was used:
val bcIdf = dataset.context.broadcast(idf)
dataset.mapPartitions
Anyone have luck with this? An issue encountered is handling multiple
languages - python, java,scala within one module : it is unclear how to
select two module SDK's.
Both Python and Scala facets were added to the spark-parent module. But
when the Project level SDK is not set to Python then the
)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
--
Regards,
Stephen Merity
Data Scientist @ Common Crawl
Hi!
tldr; We're looking at potentially using Spark+GraphX to compute PageRank
over a 4 billion node + 128 billion edge graph on a regular (monthly) basis,
possibly growing larger in size over time. If anyone has hints / tips /
upcoming optimizations I should test use (or wants to contribute --
What is the proper way to build with hive from sbt? The SPARK_HIVE is
deprecated. However after running the following:
sbt -Pyarn -Phadoop-2.3 -Phive assembly/assembly
And then
bin/pyspark
hivectx = HiveContext(sc)
hivectx.hiveql(select * from my_table)
Exception: (You must build
Did you receive any response on this? I am trying to load hbase classes
and getting the same error py4j.protocol.Py4JError: Trying to call a
package. . Even though the $HBASE_HOME/lib/* had already been added to
the compute-classpath.sh
2014-10-21 16:02 GMT-07:00 Mike Sukmanowsky
/best practice
for cogroup code?
Thanks,
Stephen
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
be safer in this regard, but I don't understand the nuances yet.
- Stephen
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi Shixiong,
The Iterable from cogroup is CompactBuffer, which is already
materialized. It's not a lazy Iterable. So now Spark cannot handle
skewed data that some key has too many values that cannot be fit into
the memory.
Cool, thanks for the confirmation.
- Stephen
like a breaking change to the spark.eventLog.dir
config property.
Perhaps it should be patched to convert the previously supported
just a file path values to HDFS-compatible file://... URIs
for backwards compatibility?
- Stephen
On Wed, 28 Jan 2015 12:27:17 -0800
Krishna Sankar ksanka
?
It is possible I did something dumb while compiling master,
but I'm not sure what it would be.
Thanks,
Stephen
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Amit - IJ will not find it until you add the import as Sean mentioned. It
includes implicits that intellij will not know about otherwise.
2015-01-30 12:44 GMT-08:00 Amit Behera amit.bd...@gmail.com:
I am sorry Sean.
I am developing code in intelliJ Idea. so with the above dependencies I am
Theoretically your approach would require less overhead - i.e. a collect on
the driver is not required as the last step. But maybe the difference is
small and that particular path may or may not have been properly optimized
vs the count(). Do you have a biggish data set to compare the timings?
Looking at https://github.com/apache/spark/pull/1222/files ,
the following change may have caused what Stephen described:
+ if (!fileSystem.isDirectory(new Path(logBaseDir))) {
When there is no schema associated with logBaseDir, local path
should be assumed.
Yes, that looks right
I am finding that partitionBy is hanging - and it is not clear whether the
custom partitioner is even being invoked (i put an exception in there and
can not see it in the worker logs).
The structure is similar to the following:
inputPairedRdd = sc.parallelize([{0:Entry1,1,Entry2}])
def
You have sent four questions that are very general in nature. They might be
better answered if you googled for those topics: there is a wealth of
materials available.
2015-03-02 2:01 GMT-08:00 dubey_a abhishek.du...@xoriant.com:
What are the ways to tune query performance in Spark SQL?
--
into the directory structure
boolean recursive = job.getBoolean(INPUT_DIR_RECURSIVE, false);
where:
public static final String INPUT_DIR_RECURSIVE =
mapreduce.input.fileinputformat.input.dir.recursive;
FYI
On Tue, Mar 3, 2015 at 3:14 PM, Stephen Boesch java...@gmail.com wrote
The sc.textFile() invokes the Hadoop FileInputFormat via the (subclass)
TextInputFormat. Inside the logic does exist to do the recursive directory
reading - i.e. first detecting if an entry were a directory and if so then
descending:
for (FileStatus
Hi Xi,
Yes,
You can do the following:
val sc = new SparkContext(local[2], mptest)
// or .. val sc = new SparkContext(spark://master:7070, mptest)
val fileDataRdd = sc.textFile(/path/to/dir)
val fileLines = fileDataRdd.take(100)
The key here - i.e. the answer to your specific question -
Is there a way to take advantage of the underlying datasource partitions
when generating a DataFrame/SchemaRDD via catalyst? It seems from the sql
module that the only options are RangePartitioner and HashPartitioner - and
further that those are selected automatically by the code . It was not
So I can’t for the life of me to get something even simple working for Spark on
Mesos.
I installed a 3 master, 3 slave mesos cluster, which is all configured, but I
can’t for the life of me even get the spark shell to work properly.
I get errors like this
org.apache.spark.SparkException: Job
So I installed spark on each of the slaves 1.3.1 built with hadoop2.6 I just
basically got the pre-built from the spark website…
I placed those compiled spark installs on each slave at /opt/spark
My spark properties seem to be getting picked up on my side fine…
There are questions in all three languages.
2015-05-05 3:49 GMT-07:00 Kartik Mehta kartik.meht...@gmail.com:
I too have similar question.
My understanding is since Spark written in scala, having done in Scala
will be ok for certification.
If someone who has done certification can confirm.
Hi Akhil, Building with sbt tends to need around 3.5GB whereas maven
requirements are much lower , around 1.7GB. So try using maven .
For reference I have the following settings and both do compile. sbt would
not work with lower values.
$echo $SBT_OPTS
-Xmx3012m -XX:MaxPermSize=512m
Yea, I wouldn't try and modify the current since RDDs are suppose to be
immutable, just create a new one...
val newRdd = oldRdd.map(r = (r._2(), r._1()))
or something of that nature...
Steve
From: Evo Eftimov [evo.efti...@isecc.com]
Sent: Thursday, May 14, 2015
didn’t exist
there. When ran as root, it ran totally fine with no problems what so ever.
Hopefully this works for you too,
Steve
On May 13, 2015, at 11:45 AM, Sander van Dijk sgvand...@gmail.com wrote:
Hey all,
I seem to be experiencing the same thing as Stephen. I run Spark 1.2.1 with
Mesos
Yup, exactly as Tim mentioned on it too. I went back and tried what you just
suggested and that was also perfectly fine.
Steve
On May 13, 2015, at 1:58 PM, Tim Chen
t...@mesosphere.iomailto:t...@mesosphere.io wrote:
Hi Stephen,
You probably didn't run the Spark driver/shell as root, as Mesos
The hadoop support from HortonWorks only *actually *works with Windows
Server - well at least as of Spark Summit last year : and AFAIK that has
not changed since
2015-04-16 15:18 GMT-07:00 Dean Wampler deanwamp...@gmail.com:
If you're running Hadoop, too, now that Hortonworks supports Spark,
they can be made public inside the library or have some
interface to them such that children classes can make use of them?
Thanks,
Stephen Carman, M.S.
AI Engineer, Coldlight Solutions, LLC
Cell - 267 240 0363
This e-mail is intended solely for the above-mentioned recipient and it may
contain
What conditions would cause the following delays / failure for a standalone
machine/cluster to have the Worker contact the Master?
15/05/20 02:02:53 INFO WorkerWebUI: Started WorkerWebUI at
http://10.0.0.3:8081
15/05/20 02:02:53 INFO Worker: Connecting to master
Hi User group,
We are using spark Linear Regression with SGD as the optimization technique and
we are achieving very sub-optimal results.
Can anyone shed some light on why this implementation seems to produce such
poor results vs our own implementation?
We are using a very small dataset, but
Oryx 2 has a scala client
https://github.com/OryxProject/oryx/blob/master/framework/oryx-api/src/main/scala/com/cloudera/oryx/api/
2015-06-20 11:39 GMT-07:00 Debasish Das debasish.da...@gmail.com:
After getting used to Scala, writing Java is too much work :-)
I am looking for scala based
I downloaded the 1.3.1 distro tarball
$ll ../spark-1.3.1.tar.gz
-rw-r-@ 1 steve staff 8500861 Apr 23 09:58 ../spark-1.3.1.tar.gz
However the build on it is failing with an unresolved dependency:
*configuration
not public*
$ build/sbt assembly -Dhadoop.version=2.5.2 -Pyarn -Phadoop-2.4
(the same btw applies for the Node where you run the driver app – all
other nodes must be able to resolve its name)
*From:* Stephen Boesch [mailto:java...@gmail.com]
*Sent:* Wednesday, May 20, 2015 10:07 AM
*To:* user
*Subject:* Intermittent difficulties for Worker to contact Master on
same
A colleague and I were having a discussion and we were disagreeing about
something in Spark/Mesos that perhaps someone can shed some light into.
We have a mesos cluster that runs spark via a sparkHome, rather than
downloading an executable and such.
My colleague says that say we have parquet
TestRunner: power-iteration-clustering 8 512.0
MB 2015/05/27
12:44:03 steve FINISHED 6 s
app-20150527123822- TestRunner: power-iteration-clustering 8 512.0
MB 2015/05/27
12:38:22 steve FINISHED 6 s
2015-05-27 11:42 GMT-07:00 Stephen Boesch java...@gmail.com:
Thanks Yana,
My current
Vanilla map/reduce does not expose it: but hive on top of map/reduce has
superior partitioning (and bucketing) support to Spark.
2015-06-28 13:44 GMT-07:00 Koert Kuipers ko...@tresata.com:
spark is partitioner aware, so it can exploit a situation where 2 datasets
are partitioned the same way
Hi Ricardo,
providing the error output would help . But in any case you need to do a
collect() on the rdd returned from computeCost.
2015-05-19 11:59 GMT-07:00 Ricardo Goncalves da Silva
ricardog.si...@telefonica.com:
Hi,
Can anybody see what’s wrong in this piece of code:
I am building spark with the following options - most notably the
**scala-2.11**:
. dev/switch-to-scala-2.11.sh
mvn -Phive -Pyarn -Phadoop-2.6 -Dhadoop2.6.2 -Pscala-2.11 -DskipTests
-Dmaven.javadoc.skip=true clean package
The build goes pretty far but fails in one of the minor modules
.
FYI
On Sun, Aug 16, 2015 at 11:12 AM, Stephen Boesch java...@gmail.com
wrote:
I am building spark with the following options - most notably the
**scala-2.11**:
. dev/switch-to-scala-2.11.sh
mvn -Phive -Pyarn -Phadoop-2.6 -Dhadoop2.6.2 -Pscala-2.11 -DskipTests
-Dmaven.javadoc.skip
Given the following command line to spark-submit:
bin/spark-submit --verbose --master local[2]--class
org.yardstick.spark.SparkCoreRDDBenchmark
/shared/ysgood/target/yardstick-spark-uber-0.0.1.jar
Here is the output:
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes
When deploying a spark streaming application I want to be able to retrieve
the lastest kafka offsets that were processed by the pipeline, and create
my kafka direct streams from those offsets. Because the checkpoint
directory isn't guaranteed to be compatible between job deployments, I
don't want
The NoClassDefFoundException differs from ClassNotFoundException : it
indicates an error while initializing that class: but the class is found in
the classpath. Please provide the full stack trace.
2015-08-14 4:59 GMT-07:00 stelsavva stel...@avocarrot.com:
Hello, I am just starting out with
when using spark-submit: which directory contains third party libraries
that will be loaded on each of the slaves? I would like to scp one or more
libraries to each of the slaves instead of shipping the contents in the
application uber-jar.
Note: I did try adding to $SPARK_HOME/lib_managed/jars.
One option is the databricks/spark-perf project
https://github.com/databricks/spark-perf
2015-07-08 11:23 GMT-07:00 MrAsanjar . afsan...@gmail.com:
Hi all,
What is the most common used tool/product to benchmark spark job?
The following errors are occurring upon building using mvn options clean
package
Are there some requirements/restrictions on profiles/settings for catalyst
to build properly?
[error]
/shared/sparkup2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:138:
value
Yes, adding that flag does the trick. thanks.
2015-09-10 13:47 GMT-07:00 Sean Owen <so...@cloudera.com>:
> -Dtest=none ?
>
>
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-RunningIndividualTests
>
> On Thu, Sep 10, 2015 at
I have invoked mvn test with the -DwildcardSuites option to specify a
single BinarizerSuite scalatest suite.
The command line is
mvn -pl mllib -Pyarn -Phadoop-2.6 -Dhadoop2.7.1 -Dscala-2.11
-Dmaven.javadoc.skip=true
-DwildcardSuites=org.apache.spark.ml.feature.BinarizerSuite test
The scala
n.
15/09/28 20:07:46 INFO remote.RemoteActorRefProvider$RemotingTerminator:
Remote daemon shut down; proceeding with flushing remote transports.
15/09/28 20:07:46 INFO remote.RemoteActorRefProvider$RemotingTerminator:
Remoting shut down.
Any thoughts?
Thanks
Stephen
ust the last thing that happens to
> fail.
>
> On Sun, Oct 4, 2015 at 7:06 AM, Stephen Boesch <java...@gmail.com> wrote:
> >
> > For a week or two the trunk has not been building for the examples module
> > within intellij. The other modules - including core, sql,
For a week or two the trunk has not been building for the examples module
within intellij. The other modules - including core, sql, mllib, etc *are *
working.
A portion of the error message is
"Unable to get dependency information: Unable to read the metadata file for
artifact
Hi Michel, please try local[1] and report back if the breakpoint were hit.
2015-09-18 7:37 GMT-07:00 Michel Lemay :
> Hi,
>
> I'm adding unit tests to some utility functions that are using
> SparkContext but I'm unable to debug code and hit breakpoints when running
> under
@Yu Fengdong: Your approach - specifically the groupBy results in a
shuffle does it not?
2015-12-04 2:02 GMT-08:00 Fengdong Yu :
> There are many ways, one simple is:
>
> such as: you want to know how many rows for each month:
>
>
>
There are solid reasons to have built spark on the jvm vs python. The
question for Daniel appear to be at this point scala vs java8. For that
there are many comparisons already available: but in the case of working
with spark there is the additional benefit for the scala side that the core
Alternating least squares takes an RDD of (user/product/ratings) tuples
and the resulting Model provides predict(user, product) or predictProducts
methods among others.
The postgres jdbc driver needs to be added to the classpath of your spark
workers. You can do a search for how to do that (multiple ways).
2015-12-22 17:22 GMT-08:00 b2k70 :
> I see in the Spark SQL documentation that a temporary table can be created
> directly onto a
HI Benjamin, yes by adding to the thrift server then the create table
would work. But querying is performed by the workers: so you need to add
to the classpath of all nodes for reads to work.
2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuil...@gmail.com>:
> Hi Stephen,
>
> I fo
r.t. building locally, please specify -Pscala-2.11
>
> Cheers
>
> On Tue, Nov 24, 2015 at 9:58 AM, Stephen Boesch <java...@gmail.com> wrote:
>
>> HI Madabhattula
>> Scala 2.11 requires building from source. Prebuilt binaries are
>> available only for sca
HI Madabhattula
Scala 2.11 requires building from source. Prebuilt binaries are
available only for scala 2.10
>From the src folder:
dev/change-scala-version.sh 2.11
Then build as you would normally either from mvn or sbt
The above info *is* included in the spark docs but a little hard
The following works against a hive table from spark sql
hc.sql("select id,r from (select id, name, rank() over (order by name) as
r from tt2) v where v.r >= 1 and v.r <= 12")
But when using a standard sql context against a temporary table the
following occurs:
Exception in thread "main"
Checked out 1.6.0-SNAPSHOT 60 minutes ago
2015-11-18 19:19 GMT-08:00 Jack Yang <j...@uow.edu.au>:
> Which version of spark are you using?
>
>
>
> *From:* Stephen Boesch [mailto:java...@gmail.com]
> *Sent:* Thursday, 19 November 2015 2:12 PM
> *To:* user
> *Su
But to focus the attention properly: I had already tried out 1.5.2.
2015-11-18 19:46 GMT-08:00 Stephen Boesch <java...@gmail.com>:
> Checked out 1.6.0-SNAPSHOT 60 minutes ago
>
> 2015-11-18 19:19 GMT-08:00 Jack Yang <j...@uow.edu.au>:
>
>> Which version of spark ar
Why is the same query (and actually i tried several variations) working
against a hivecontext and not against the sql context?
2015-11-18 19:57 GMT-08:00 Michael Armbrust <mich...@databricks.com>:
> Yes they do.
>
> On Wed, Nov 18, 2015 at 7:49 PM, Stephen Boesch <java..
>> and then use the Hive's dynamic partitioned insert syntax
What does this entail? Same sql but you need to do
set hive.exec.dynamic.partition = true;
in the hive/sql context (along with several other related dynamic
partition settings.)
Is there anything else/special
ooc are the tables partitioned on a.pk and b.fk? Hive might be using
copartitioning in that case: it is one of hive's strengths.
2016-06-09 7:28 GMT-07:00 Gourav Sengupta :
> Hi Mich,
>
> does not Hive use map-reduce? I thought it to be so. And since I am
> running
How many workers (/cpu cores) are assigned to this job?
2016-06-09 13:01 GMT-07:00 SRK :
> Hi,
>
> How to insert data into 2000 partitions(directories) of ORC/parquet at a
> time using Spark SQL? It seems to be not performant when I try to insert
> 2000 directories of
Presently only the mllib version has the one-vs-all approach for
multinomial support. The ml version with ElasticNet support only allows
binary regression.
With feature parity of ml vs mllib having been stated as an objective for
2.0.0 - is there a projected availability of the multinomial
Followup: just encountered the "OneVsRest" classifier in
ml.classsification: I will look into using it with the binary
LogisticRegression as the provided classifier.
2016-05-28 9:06 GMT-07:00 Stephen Boesch <java...@gmail.com>:
>
> Presently only the mllib version has t
1 - 100 of 206 matches
Mail list logo