date:20150108

Re: Spark development with IntelliJ

2015-01-08 Thread Sean Owen

I remember seeing this too, but it seemed to be transient. Try
compiling again. In my case I recall that IJ was still reimporting
some modules when I tried to build. I don't see this error in general.

On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote:
 I was having the same issue and that helped.  But now I get the following
 compilation error when trying to run a test from within Intellij (v 14)

 /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
 Error:(308, 109) polymorphic expression cannot be instantiated to expected
 type;
  found   : [T(in method
 apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
  required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in
 method functionToUdfBuilder)]
   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
 ScalaUdfBuilder[T] = ScalaUdfBuilder(func)

 Any thoughts?

 ^

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Spark development with IntelliJ

2015-01-08 Thread Bill Bejeck

That worked, thx

On Thu, Jan 8, 2015 at 6:17 PM, Sean Owen so...@cloudera.com wrote:

 I remember seeing this too, but it seemed to be transient. Try
 compiling again. In my case I recall that IJ was still reimporting
 some modules when I tried to build. I don't see this error in general.

 On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote:
  I was having the same issue and that helped.  But now I get the following
  compilation error when trying to run a test from within Intellij (v 14)
 
 
 /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
  Error:(308, 109) polymorphic expression cannot be instantiated to
 expected
  type;
   found   : [T(in method
  apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method
 apply)]
   required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in
  method functionToUdfBuilder)]
implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
  ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
 
  Any thoughts?
 
  ^

missing document of several messages in actor-based receiver?

2015-01-08 Thread Nan Zhu

Hi, TD and other streaming developers,

When I look at the implementation of actor-based receiver 
(ActorReceiver.scala), I found that there are several messages which are not 
mentioned in the document  

case props: Props =
val worker = context.actorOf(props)
logInfo(Started receiver worker at: + worker.path)
sender ! worker

case (props: Props, name: String) =
val worker = context.actorOf(props, name)
logInfo(Started receiver worker at: + worker.path)
sender ! worker

case _: PossiblyHarmful = hiccups.incrementAndGet()

case _: Statistics =
val workers = context.children
sender ! Statistics(n.get, workers.size, hiccups.get, workers.mkString(\n”))

Is it hided with intention or incomplete document, or I missed something?
And the handler of these messages are “buggy? e.g. when we start a new worker, 
we didn’t increase n (counter of children), and n and hiccups are unnecessarily 
set to AtomicInteger ?

Best,

--  
Nan Zhu
http://codingcat.me

Re: Spark development with IntelliJ

2015-01-08 Thread Nicholas Chammas

Side question: Should this section
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IDESetup
in
the wiki link to Useful Developer Tools
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools?

On Thu Jan 08 2015 at 6:19:55 PM Sean Owen so...@cloudera.com wrote:

 I remember seeing this too, but it seemed to be transient. Try
 compiling again. In my case I recall that IJ was still reimporting
 some modules when I tried to build. I don't see this error in general.

 On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote:
  I was having the same issue and that helped.  But now I get the following
  compilation error when trying to run a test from within Intellij (v 14)
 
  /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/
 catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
  Error:(308, 109) polymorphic expression cannot be instantiated to
 expected
  type;
   found   : [T(in method
  apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method
 apply)]
   required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(
 in
  method functionToUdfBuilder)]
implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
  ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
 
  Any thoughts?
 
  ^

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

[ANNOUNCE] Apache Science and Healthcare Track @ApacheCon NA 2015

2015-01-08 Thread Lewis John Mcgibbney

Hi Folks,

Apologies for cross posting :(

As some of you may already know, @ApacheCon NA 2015 is happening in Austin,
TX April 13th-16th.

This email is specifically written to attract all folks interested in
Science and Healthcare... this is an official call to arms! I am aware that
there are many Science and Healthcare-type people also lingering in the
Apache Semantic Web communities so this one is for all of you folks as well.

Over a number of years the Science track has been emerging as an attractive
and exciting, at times mind blowing non-traditional track running alongside
the resident HTTP server, Big Data, etc tracks. The Semantic Web Track is
another such emerging track which has proved popular. This year we want to
really get the message out there about how much Apache technology is
actually being used in Science and Healthcare. This is not *only* aimed at
attracting members of the communities below
http://wiki.apache.org/apachecon/ACNA2015ContentCommittee#Target_Projects
but also at potentially attracting a brand new breed of conference
participants to ApacheCon https://wiki.apache.org/apachecon/ApacheCon and
the Foundation e.g. Scientists who love Apache. We are looking for
exciting, invigorating, obscure, half-baked, funky, academic, practical and
impractical stories, use cases, experiments and down right successes alike
from within the Science domain. The only thing they need to have in common
is that they consume, contribute towards, advocate, disseminate or even
commercialize Apache technology within the Scientific domain and would be
relevant to that audience. It is fully open to interest whether this track
be combined with the proposed *healthcare track*... if there is interest to
do this then we can rename this track to Science and Healthcare. In essence
one could argue that they are one and the same however I digress [image: :)]

What I would like those of you that are interested to do, is to merely
check out the scope and intent of the Apache in Science content curation
which is currently ongoing and to potentially register your interest.

https://wiki.apache.org/apachecon/ACNA2015ContentCommittee#Apache_in_Science

I would love to see the Science and Healthcare track be THE BIGGEST track
@ApacheCon, and although we have some way to go, I'm sure many previous
track participants will tell you this is not to missed.

We are looking for content from a wide variety of Scientific use cases all
related to Apache technology.
Thanks in advance and I look forward to seeing you in Austin.
Lewis

-- 
*Lewis*

Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell

Nick - yes. Do you mind moving it? I should have put it in the
Contributing to Spark page.

On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 Side question: Should this section
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IDESetup
 in
 the wiki link to Useful Developer Tools
 https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools?

 On Thu Jan 08 2015 at 6:19:55 PM Sean Owen so...@cloudera.com wrote:

 I remember seeing this too, but it seemed to be transient. Try
 compiling again. In my case I recall that IJ was still reimporting
 some modules when I tried to build. I don't see this error in general.

 On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote:
  I was having the same issue and that helped.  But now I get the following
  compilation error when trying to run a test from within Intellij (v 14)
 
  /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/
 catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
  Error:(308, 109) polymorphic expression cannot be instantiated to
 expected
  type;
   found   : [T(in method
  apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method
 apply)]
   required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(
 in
  method functionToUdfBuilder)]
implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
  ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
 
  Any thoughts?
 
  ^

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell

Actually I went ahead and did it.

On Thu, Jan 8, 2015 at 10:25 PM, Patrick Wendell pwend...@gmail.com wrote:
 Nick - yes. Do you mind moving it? I should have put it in the
 Contributing to Spark page.

 On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
 Side question: Should this section
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IDESetup
 in
 the wiki link to Useful Developer Tools
 https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools?

 On Thu Jan 08 2015 at 6:19:55 PM Sean Owen so...@cloudera.com wrote:

 I remember seeing this too, but it seemed to be transient. Try
 compiling again. In my case I recall that IJ was still reimporting
 some modules when I tried to build. I don't see this error in general.

 On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote:
  I was having the same issue and that helped.  But now I get the following
  compilation error when trying to run a test from within Intellij (v 14)
 
  /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/
 catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
  Error:(308, 109) polymorphic expression cannot be instantiated to
 expected
  type;
   found   : [T(in method
  apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method
 apply)]
   required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(
 in
  method functionToUdfBuilder)]
implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
  ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
 
  Any thoughts?
 
  ^

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: K-Means And Class Tags

2015-01-08 Thread Joseph Bradley

I believe you're running into an erasure issue which we found in
DecisionTree too.  Check out:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala#L134

That retags RDDs which were created from Java to prevent the exception
you're running into.

Hope this helps!
Joseph

On Thu, Jan 8, 2015 at 12:48 PM, Devl Devel devl.developm...@gmail.com
wrote:

 Thanks for the suggestion, can anyone offer any advice on the ClassCast
 Exception going from Java to Scala? Why does JavaRDD.rdd() and then a
 collect() result in this exception?

 On Thu, Jan 8, 2015 at 4:13 PM, Yana Kadiyska yana.kadiy...@gmail.com
 wrote:

  How about
 
 
 data.map(s=s.split(,)).filter(_.length1).map(good_entry=Vectors.dense((Double.parseDouble(good_entry[0]),
  Double.parseDouble(good_entry[1]))
  
  (full disclosure, I didn't actually run this). But after the first map
 you
  should have an RDD[Array[String]], then you'd discard everything shorter
  than 2, and convert the rest to dense vectors?...In fact if you're
  expecting length exactly 2 might want to filter ==2...
 
 
  On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel devl.developm...@gmail.com
  wrote:
 
  Hi All,
 
  I'm trying a simple K-Means example as per the website:
 
  val parsedData = data.map(s =
  Vectors.dense(s.split(',').map(_.toDouble)))
 
  but I'm trying to write a Java based validation method first so that
  missing values are omitted or replaced with 0.
 
  public RDDVector prepareKMeans(JavaRDDString data) {
  JavaRDDVector words = data.flatMap(new FlatMapFunctionString,
  Vector() {
  public IterableVector call(String s) {
  String[] split = s.split(,);
  ArrayListVector add = new ArrayListVector();
  if (split.length != 2) {
  add.add(Vectors.dense(0, 0));
  } else
  {
  add.add(Vectors.dense(Double.parseDouble(split[0]),
 Double.parseDouble(split[1])));
  }
 
  return add;
  }
  });
 
  return words.rdd();
  }
 
  When I then call from scala:
 
  val parsedData=dc.prepareKMeans(data);
  val p=parsedData.collect();
 
  I get Exception in thread main java.lang.ClassCastException:
  [Ljava.lang.Object; cannot be cast to
  [Lorg.apache.spark.mllib.linalg.Vector;
 
  Why is the class tag is object rather than vector?
 
  1) How do I get this working correctly using the Java validation example
  above or
  2) How can I modify val parsedData = data.map(s =
  Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size
 2
  I
  ignore the line? or
  3) Is there a better way to do input validation first?
 
  Using spark and mlib:
  libraryDependencies += org.apache.spark % spark-core_2.10 %  1.2.0
  libraryDependencies += org.apache.spark % spark-mllib_2.10 % 1.2.0
 
  Many thanks in advance
  Dev

Re: Spark development with IntelliJ

2015-01-08 Thread Bill Bejeck

I was having the same issue and that helped. But now I get the following
compilation error when trying to run a test from within Intellij (v 14)

/Users/bbejeck/dev/github_clones/bbejeck-spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
Error:(308, 109) polymorphic expression cannot be instantiated to expected
type;
found : [T(in method
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in
method functionToUdfBuilder)]
implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]):
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)

Any thoughts?

On Thu, Jan 8, 2015 at 6:33 AM, Jakub Dubovsky
spark.dubovsky.ja...@seznam.cz wrote:

Thanks that helped.

I vote for wiki as well. More fine graned documentation should be on wiki
and linked,

Jakub

-- Původní zpráva --
Od: Sean Owen so...@cloudera.com
Komu: Jakub Dubovsky spark.dubovsky.ja...@seznam.cz
Datum: 8. 1. 2015 11:29:22
Předmět: Re: Spark development with IntelliJ

Yeah, I hit this too. IntelliJ picks this up from the build but then
it can't run its own scalac with this plugin added.

Go to Preferences Build, Execution, Deployment Scala Compiler and
clear the Additional compiler options field. It will work then
although the option will come back when the project reimports.

Right now I don't know of a better fix.

There's another recent open question about updating IntelliJ docs:
https://issues.apache.org/jira/browse/SPARK-5136 Should this stuff go
in the site docs, or wiki? I vote for wiki I suppose and make the site
docs point to the wiki. I'd be happy to make wiki edits if I can get
permission, or propose this text along with other new text on the
JIRA.

On Thu, Jan 8, 2015 at 10:00 AM, Jakub Dubovsky
spark.dubovsky.ja...@seznam.cz wrote:
Hi devs,

I'd like to ask if anybody has experience with using intellij 14 to step
into spark code. Whatever I try I get compilation error:

Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/
paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar

Project is set up by Patrick's instruction [1] and packaged by mvn -
DskipTests clean install. Compilation works fine. Then I just created
breakpoint in test code and run debug with the error.

Thanks for any hints

Jakub

[1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+
Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

PR #3872

2015-01-08 Thread Bill Bejeck

Could one of the admins take a look at PR 3872 (JIRA 3299) submitted on 1/1

Spark development with IntelliJ

2015-01-08 Thread Jakub Dubovsky

Hi devs,

  I'd like to ask if anybody has experience with using intellij 14 to step 
into spark code. Whatever I try I get compilation error:

Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/
paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar

  Project is set up by Patrick's instruction [1] and packaged by mvn -
DskipTests clean install. Compilation works fine. Then I just created 
breakpoint in test code and run debug with the error.

  Thanks for any hints

  Jakub

[1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+
Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA

Re: Maintainer for Mesos

2015-01-08 Thread RJ Nowling

Hi Andrew,

Patrick Wendell and Andrew Or have committed previous patches related to
Mesos. Maybe they would be good committers to look at it?

RJ

On Mon, Jan 5, 2015 at 6:40 PM, Andrew Ash and...@andrewash.com wrote:

 Hi Spark devs,

 I'm interested in having a committer look at a PR [1] for Mesos, but
 there's not an entry for Mesos in the maintainers specialties on the wiki
 [2].  Which Spark committers have expertise in the Mesos features?

 Thanks!
 Andrew


 [1] https://github.com/apache/spark/pull/3074
 [2]

 https://cwiki.apache.org/confluence/display/SPARK/Committers#Committers-ReviewProcessandMaintainers

Re: Spark on teradata?

2015-01-08 Thread xhudik

I don't think this makes sense. TD database is standard RDBMS (even parallel)
while Spark is used for non-relational issues. 
What could make sense is to deploy Spark on Teradata Aster. Aster is a
database cluster that might call external programs via STREAM operator. 
That said Spark/Scala app can be can be called and process some data. The
deployment itself should be easy the potential benefit - hard to say...


hope this helps, Tomas



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-on-teradata-tp10025p10042.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Results of tests

2015-01-08 Thread Ted Yu

Here it is:

[centos] $ 
/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn
-DHADOOP_PROFILE=hadoop-2.4 -Dlabel=centos -DskipTests -Phadoop-2.4
-Pyarn -Phive clean package


You can find the above in
https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull


Cheers


On Thu, Jan 8, 2015 at 8:05 AM, Tony Reix tony.r...@bull.net wrote:

  Thanks !

 I've been able to see that there are 3745 tests for version 1.2.0 with
 profile Hadoop 2.4  .
 However, on my side, the maximum tests I've seen are 3485... About 300
 tests are missing on my side.
 Which Maven option has been used for producing the report file used for
 building the page:

 https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/
   ? (I'm not authorized to look at the configuration part)

 Thx !

 Tony

  --
 *De :* Ted Yu [yuzhih...@gmail.com]
 *Envoyé :* jeudi 8 janvier 2015 16:11
 *À :* Tony Reix
 *Cc :* dev@spark.apache.org
 *Objet :* Re: Results of tests

   Please take a look at https://amplab.cs.berkeley.edu/jenkins/view/Spark/

 On Thu, Jan 8, 2015 at 5:40 AM, Tony Reix tony.r...@bull.net wrote:

 Hi,
 I'm checking that Spark works fine on a new environment (PPC64 hardware).
 I've found some issues, with versions 1.1.0, 1.1.1, and 1.2.0, even when
 running on Ubuntu on x86_64 with Oracle JVM. I'd like to know where I can
 find the results of the tests of Spark, for each version and for the
 different versions, in order to have a reference to compare my results
 with. I cannot find them on Spark web-site.
 Thx
 Tony

Re: Registering custom metrics

2015-01-08 Thread Gerard Maas

Very interesting approach. Thanks for sharing it!

On Thu, Jan 8, 2015 at 5:30 PM, Enno Shioji eshi...@gmail.com wrote:

 FYI I found this approach by Ooyala.

 /** Instrumentation for Spark based on accumulators.
   *
   * Usage:
   * val instrumentation = new SparkInstrumentation(example.metrics)
   * val numReqs = sc.accumulator(0L)
   * instrumentation.source.registerDailyAccumulator(numReqs, numReqs)
   * instrumentation.register()
   *
   * Will create and report the following metrics:
   * - Gauge with total number of requests (daily)
   * - Meter with rate of requests
   *
   * @param prefix prefix for all metrics that will be reported by this 
 Instrumentation
   */

 https://gist.github.com/ibuenros/9b94736c2bad2f4b8e23
 ᐧ

 On Mon, Jan 5, 2015 at 2:56 PM, Enno Shioji eshi...@gmail.com wrote:

 Hi Gerard,

 Thanks for the answer! I had a good look at it, but I couldn't figure out
 whether one can use that to emit metrics from your application code.

 Suppose I wanted to monitor the rate of bytes I produce, like so:

 stream
 .map { input =
   val bytes = produce(input)
   // metricRegistry.meter(some.metrics).mark(bytes.length)
   bytes
 }
 .saveAsTextFile(text)

 Is there a way to achieve this with the MetricSystem?


 ᐧ

 On Mon, Jan 5, 2015 at 10:24 AM, Gerard Maas gerard.m...@gmail.com
 wrote:

 Hi,

 Yes, I managed to create a register custom metrics by creating an
  implementation  of org.apache.spark.metrics.source.Source and
 registering it to the metrics subsystem.
 Source is [Spark] private, so you need to create it under a org.apache.spark
 package. In my case, I'm dealing with Spark Streaming metrics, and I
 created my CustomStreamingSource under org.apache.spark.streaming as I
 also needed access to some [Streaming] private components.

 Then, you register your new metric Source on the Spark's metric system,
 like so:

 SparkEnv.get.metricsSystem.registerSource(customStreamingSource)

 And it will get reported to the metrics Sync active on your system. By
 default, you can access them through the metric endpoint:
 http://driver-host:ui-port/metrics/json

 I hope this helps.

 -kr, Gerard.






 On Tue, Dec 30, 2014 at 3:32 PM, eshioji eshi...@gmail.com wrote:

 Hi,

 Did you find a way to do this / working on this?
 Am trying to find a way to do this as well, but haven't been able to
 find a
 way.



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Registering-custom-metrics-tp9030p9968.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Spark on teradata?

2015-01-08 Thread Reynold Xin

Depending on your use cases. If the use case is to extract small amount of
data out of teradata, then you can use the JdbcRDD and soon a jdbc input
source based on the new Spark SQL external data source API.



On Wed, Jan 7, 2015 at 7:14 AM, gen tang gen.tan...@gmail.com wrote:

 Hi,

 I have a stupid question:
 Is it possible to use spark on Teradata data warehouse, please? I read
 some news on internet which say yes. However, I didn't find any example
 about this issue

 Thanks in advance.

 Cheers
 Gen

Re: K-Means And Class Tags

2015-01-08 Thread Devl Devel

Thanks for the suggestion, can anyone offer any advice on the ClassCast
Exception going from Java to Scala? Why does JavaRDD.rdd() and then a
collect() result in this exception?

On Thu, Jan 8, 2015 at 4:13 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:

 How about

 data.map(s=s.split(,)).filter(_.length1).map(good_entry=Vectors.dense((Double.parseDouble(good_entry[0]),
 Double.parseDouble(good_entry[1]))
 
 (full disclosure, I didn't actually run this). But after the first map you
 should have an RDD[Array[String]], then you'd discard everything shorter
 than 2, and convert the rest to dense vectors?...In fact if you're
 expecting length exactly 2 might want to filter ==2...


 On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel devl.developm...@gmail.com
 wrote:

 Hi All,

 I'm trying a simple K-Means example as per the website:

 val parsedData = data.map(s =
 Vectors.dense(s.split(',').map(_.toDouble)))

 but I'm trying to write a Java based validation method first so that
 missing values are omitted or replaced with 0.

 public RDDVector prepareKMeans(JavaRDDString data) {
 JavaRDDVector words = data.flatMap(new FlatMapFunctionString,
 Vector() {
 public IterableVector call(String s) {
 String[] split = s.split(,);
 ArrayListVector add = new ArrayListVector();
 if (split.length != 2) {
 add.add(Vectors.dense(0, 0));
 } else
 {
 add.add(Vectors.dense(Double.parseDouble(split[0]),
Double.parseDouble(split[1])));
 }

 return add;
 }
 });

 return words.rdd();
 }

 When I then call from scala:

 val parsedData=dc.prepareKMeans(data);
 val p=parsedData.collect();

 I get Exception in thread main java.lang.ClassCastException:
 [Ljava.lang.Object; cannot be cast to
 [Lorg.apache.spark.mllib.linalg.Vector;

 Why is the class tag is object rather than vector?

 1) How do I get this working correctly using the Java validation example
 above or
 2) How can I modify val parsedData = data.map(s =
 Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size 2
 I
 ignore the line? or
 3) Is there a better way to do input validation first?

 Using spark and mlib:
 libraryDependencies += org.apache.spark % spark-core_2.10 %  1.2.0
 libraryDependencies += org.apache.spark % spark-mllib_2.10 % 1.2.0

 Many thanks in advance
 Dev

Re: K-Means And Class Tags

2015-01-08 Thread devl.development

Thanks for the suggestion, can anyone offer any advice on the ClassCast
Exception going from Java to Scala? Why does going from JavaRDD.rdd() and
then a collect() result in this exception?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-tp10038p10047.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Spark development with IntelliJ

Re: Spark development with IntelliJ

missing document of several messages in actor-based receiver?

Re: Spark development with IntelliJ

[ANNOUNCE] Apache Science and Healthcare Track @ApacheCon NA 2015

Re: Spark development with IntelliJ

Re: Spark development with IntelliJ

Re: K-Means And Class Tags

Re: Spark development with IntelliJ

PR #3872

Spark development with IntelliJ

Re: Maintainer for Mesos

Re: Spark on teradata?

Re: Results of tests

Re: Registering custom metrics

Re: Spark on teradata?

Re: K-Means And Class Tags

Re: K-Means And Class Tags

18 matches

Site Navigation

Mail list logo

Footer information