Re: Spark development with IntelliJ
I remember seeing this too, but it seemed to be transient. Try compiling again. In my case I recall that IJ was still reimporting some modules when I tried to build. I don't see this error in general. On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote: I was having the same issue and that helped. But now I get the following compilation error when trying to run a test from within Intellij (v 14) /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala Error:(308, 109) polymorphic expression cannot be instantiated to expected type; found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) Any thoughts? ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spark development with IntelliJ
That worked, thx On Thu, Jan 8, 2015 at 6:17 PM, Sean Owen so...@cloudera.com wrote: I remember seeing this too, but it seemed to be transient. Try compiling again. In my case I recall that IJ was still reimporting some modules when I tried to build. I don't see this error in general. On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote: I was having the same issue and that helped. But now I get the following compilation error when trying to run a test from within Intellij (v 14) /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala Error:(308, 109) polymorphic expression cannot be instantiated to expected type; found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) Any thoughts? ^
missing document of several messages in actor-based receiver?
Hi, TD and other streaming developers, When I look at the implementation of actor-based receiver (ActorReceiver.scala), I found that there are several messages which are not mentioned in the document case props: Props = val worker = context.actorOf(props) logInfo(Started receiver worker at: + worker.path) sender ! worker case (props: Props, name: String) = val worker = context.actorOf(props, name) logInfo(Started receiver worker at: + worker.path) sender ! worker case _: PossiblyHarmful = hiccups.incrementAndGet() case _: Statistics = val workers = context.children sender ! Statistics(n.get, workers.size, hiccups.get, workers.mkString(\n”)) Is it hided with intention or incomplete document, or I missed something? And the handler of these messages are “buggy? e.g. when we start a new worker, we didn’t increase n (counter of children), and n and hiccups are unnecessarily set to AtomicInteger ? Best, -- Nan Zhu http://codingcat.me
Re: Spark development with IntelliJ
Side question: Should this section https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IDESetup in the wiki link to Useful Developer Tools https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools? On Thu Jan 08 2015 at 6:19:55 PM Sean Owen so...@cloudera.com wrote: I remember seeing this too, but it seemed to be transient. Try compiling again. In my case I recall that IJ was still reimporting some modules when I tried to build. I don't see this error in general. On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote: I was having the same issue and that helped. But now I get the following compilation error when trying to run a test from within Intellij (v 14) /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/ catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala Error:(308, 109) polymorphic expression cannot be instantiated to expected type; found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T( in method functionToUdfBuilder)] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) Any thoughts? ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[ANNOUNCE] Apache Science and Healthcare Track @ApacheCon NA 2015
Hi Folks, Apologies for cross posting :( As some of you may already know, @ApacheCon NA 2015 is happening in Austin, TX April 13th-16th. This email is specifically written to attract all folks interested in Science and Healthcare... this is an official call to arms! I am aware that there are many Science and Healthcare-type people also lingering in the Apache Semantic Web communities so this one is for all of you folks as well. Over a number of years the Science track has been emerging as an attractive and exciting, at times mind blowing non-traditional track running alongside the resident HTTP server, Big Data, etc tracks. The Semantic Web Track is another such emerging track which has proved popular. This year we want to really get the message out there about how much Apache technology is actually being used in Science and Healthcare. This is not *only* aimed at attracting members of the communities below http://wiki.apache.org/apachecon/ACNA2015ContentCommittee#Target_Projects but also at potentially attracting a brand new breed of conference participants to ApacheCon https://wiki.apache.org/apachecon/ApacheCon and the Foundation e.g. Scientists who love Apache. We are looking for exciting, invigorating, obscure, half-baked, funky, academic, practical and impractical stories, use cases, experiments and down right successes alike from within the Science domain. The only thing they need to have in common is that they consume, contribute towards, advocate, disseminate or even commercialize Apache technology within the Scientific domain and would be relevant to that audience. It is fully open to interest whether this track be combined with the proposed *healthcare track*... if there is interest to do this then we can rename this track to Science and Healthcare. In essence one could argue that they are one and the same however I digress [image: :)] What I would like those of you that are interested to do, is to merely check out the scope and intent of the Apache in Science content curation which is currently ongoing and to potentially register your interest. https://wiki.apache.org/apachecon/ACNA2015ContentCommittee#Apache_in_Science I would love to see the Science and Healthcare track be THE BIGGEST track @ApacheCon, and although we have some way to go, I'm sure many previous track participants will tell you this is not to missed. We are looking for content from a wide variety of Scientific use cases all related to Apache technology. Thanks in advance and I look forward to seeing you in Austin. Lewis -- *Lewis*
Re: Spark development with IntelliJ
Nick - yes. Do you mind moving it? I should have put it in the Contributing to Spark page. On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Side question: Should this section https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IDESetup in the wiki link to Useful Developer Tools https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools? On Thu Jan 08 2015 at 6:19:55 PM Sean Owen so...@cloudera.com wrote: I remember seeing this too, but it seemed to be transient. Try compiling again. In my case I recall that IJ was still reimporting some modules when I tried to build. I don't see this error in general. On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote: I was having the same issue and that helped. But now I get the following compilation error when trying to run a test from within Intellij (v 14) /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/ catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala Error:(308, 109) polymorphic expression cannot be instantiated to expected type; found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T( in method functionToUdfBuilder)] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) Any thoughts? ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spark development with IntelliJ
Actually I went ahead and did it. On Thu, Jan 8, 2015 at 10:25 PM, Patrick Wendell pwend...@gmail.com wrote: Nick - yes. Do you mind moving it? I should have put it in the Contributing to Spark page. On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Side question: Should this section https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IDESetup in the wiki link to Useful Developer Tools https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools? On Thu Jan 08 2015 at 6:19:55 PM Sean Owen so...@cloudera.com wrote: I remember seeing this too, but it seemed to be transient. Try compiling again. In my case I recall that IJ was still reimporting some modules when I tried to build. I don't see this error in general. On Thu, Jan 8, 2015 at 10:38 PM, Bill Bejeck bbej...@gmail.com wrote: I was having the same issue and that helped. But now I get the following compilation error when trying to run a test from within Intellij (v 14) /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/ catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala Error:(308, 109) polymorphic expression cannot be instantiated to expected type; found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T( in method functionToUdfBuilder)] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) Any thoughts? ^ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: K-Means And Class Tags
I believe you're running into an erasure issue which we found in DecisionTree too. Check out: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala#L134 That retags RDDs which were created from Java to prevent the exception you're running into. Hope this helps! Joseph On Thu, Jan 8, 2015 at 12:48 PM, Devl Devel devl.developm...@gmail.com wrote: Thanks for the suggestion, can anyone offer any advice on the ClassCast Exception going from Java to Scala? Why does JavaRDD.rdd() and then a collect() result in this exception? On Thu, Jan 8, 2015 at 4:13 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: How about data.map(s=s.split(,)).filter(_.length1).map(good_entry=Vectors.dense((Double.parseDouble(good_entry[0]), Double.parseDouble(good_entry[1])) (full disclosure, I didn't actually run this). But after the first map you should have an RDD[Array[String]], then you'd discard everything shorter than 2, and convert the rest to dense vectors?...In fact if you're expecting length exactly 2 might want to filter ==2... On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel devl.developm...@gmail.com wrote: Hi All, I'm trying a simple K-Means example as per the website: val parsedData = data.map(s = Vectors.dense(s.split(',').map(_.toDouble))) but I'm trying to write a Java based validation method first so that missing values are omitted or replaced with 0. public RDDVector prepareKMeans(JavaRDDString data) { JavaRDDVector words = data.flatMap(new FlatMapFunctionString, Vector() { public IterableVector call(String s) { String[] split = s.split(,); ArrayListVector add = new ArrayListVector(); if (split.length != 2) { add.add(Vectors.dense(0, 0)); } else { add.add(Vectors.dense(Double.parseDouble(split[0]), Double.parseDouble(split[1]))); } return add; } }); return words.rdd(); } When I then call from scala: val parsedData=dc.prepareKMeans(data); val p=parsedData.collect(); I get Exception in thread main java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector; Why is the class tag is object rather than vector? 1) How do I get this working correctly using the Java validation example above or 2) How can I modify val parsedData = data.map(s = Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size 2 I ignore the line? or 3) Is there a better way to do input validation first? Using spark and mlib: libraryDependencies += org.apache.spark % spark-core_2.10 % 1.2.0 libraryDependencies += org.apache.spark % spark-mllib_2.10 % 1.2.0 Many thanks in advance Dev
Re: Spark development with IntelliJ
I was having the same issue and that helped. But now I get the following compilation error when trying to run a test from within Intellij (v 14) /Users/bbejeck/dev/github_clones/bbejeck-spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala Error:(308, 109) polymorphic expression cannot be instantiated to expected type; found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) Any thoughts? ^ On Thu, Jan 8, 2015 at 6:33 AM, Jakub Dubovsky spark.dubovsky.ja...@seznam.cz wrote: Thanks that helped. I vote for wiki as well. More fine graned documentation should be on wiki and linked, Jakub -- Původní zpráva -- Od: Sean Owen so...@cloudera.com Komu: Jakub Dubovsky spark.dubovsky.ja...@seznam.cz Datum: 8. 1. 2015 11:29:22 Předmět: Re: Spark development with IntelliJ Yeah, I hit this too. IntelliJ picks this up from the build but then it can't run its own scalac with this plugin added. Go to Preferences Build, Execution, Deployment Scala Compiler and clear the Additional compiler options field. It will work then although the option will come back when the project reimports. Right now I don't know of a better fix. There's another recent open question about updating IntelliJ docs: https://issues.apache.org/jira/browse/SPARK-5136 Should this stuff go in the site docs, or wiki? I vote for wiki I suppose and make the site docs point to the wiki. I'd be happy to make wiki edits if I can get permission, or propose this text along with other new text on the JIRA. On Thu, Jan 8, 2015 at 10:00 AM, Jakub Dubovsky spark.dubovsky.ja...@seznam.cz wrote: Hi devs, I'd like to ask if anybody has experience with using intellij 14 to step into spark code. Whatever I try I get compilation error: Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/ paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar Project is set up by Patrick's instruction [1] and packaged by mvn - DskipTests clean install. Compilation works fine. Then I just created breakpoint in test code and run debug with the error. Thanks for any hints Jakub [1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+ Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
PR #3872
Could one of the admins take a look at PR 3872 (JIRA 3299) submitted on 1/1
Spark development with IntelliJ
Hi devs, I'd like to ask if anybody has experience with using intellij 14 to step into spark code. Whatever I try I get compilation error: Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/ paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar Project is set up by Patrick's instruction [1] and packaged by mvn - DskipTests clean install. Compilation works fine. Then I just created breakpoint in test code and run debug with the error. Thanks for any hints Jakub [1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+ Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA
Re: Maintainer for Mesos
Hi Andrew, Patrick Wendell and Andrew Or have committed previous patches related to Mesos. Maybe they would be good committers to look at it? RJ On Mon, Jan 5, 2015 at 6:40 PM, Andrew Ash and...@andrewash.com wrote: Hi Spark devs, I'm interested in having a committer look at a PR [1] for Mesos, but there's not an entry for Mesos in the maintainers specialties on the wiki [2]. Which Spark committers have expertise in the Mesos features? Thanks! Andrew [1] https://github.com/apache/spark/pull/3074 [2] https://cwiki.apache.org/confluence/display/SPARK/Committers#Committers-ReviewProcessandMaintainers
Re: Spark on teradata?
I don't think this makes sense. TD database is standard RDBMS (even parallel) while Spark is used for non-relational issues. What could make sense is to deploy Spark on Teradata Aster. Aster is a database cluster that might call external programs via STREAM operator. That said Spark/Scala app can be can be called and process some data. The deployment itself should be easy the potential benefit - hard to say... hope this helps, Tomas -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-on-teradata-tp10025p10042.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Results of tests
Here it is: [centos] $ /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn -DHADOOP_PROFILE=hadoop-2.4 -Dlabel=centos -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package You can find the above in https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull Cheers On Thu, Jan 8, 2015 at 8:05 AM, Tony Reix tony.r...@bull.net wrote: Thanks ! I've been able to see that there are 3745 tests for version 1.2.0 with profile Hadoop 2.4 . However, on my side, the maximum tests I've seen are 3485... About 300 tests are missing on my side. Which Maven option has been used for producing the report file used for building the page: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/ ? (I'm not authorized to look at the configuration part) Thx ! Tony -- *De :* Ted Yu [yuzhih...@gmail.com] *Envoyé :* jeudi 8 janvier 2015 16:11 *À :* Tony Reix *Cc :* dev@spark.apache.org *Objet :* Re: Results of tests Please take a look at https://amplab.cs.berkeley.edu/jenkins/view/Spark/ On Thu, Jan 8, 2015 at 5:40 AM, Tony Reix tony.r...@bull.net wrote: Hi, I'm checking that Spark works fine on a new environment (PPC64 hardware). I've found some issues, with versions 1.1.0, 1.1.1, and 1.2.0, even when running on Ubuntu on x86_64 with Oracle JVM. I'd like to know where I can find the results of the tests of Spark, for each version and for the different versions, in order to have a reference to compare my results with. I cannot find them on Spark web-site. Thx Tony
Re: Registering custom metrics
Very interesting approach. Thanks for sharing it! On Thu, Jan 8, 2015 at 5:30 PM, Enno Shioji eshi...@gmail.com wrote: FYI I found this approach by Ooyala. /** Instrumentation for Spark based on accumulators. * * Usage: * val instrumentation = new SparkInstrumentation(example.metrics) * val numReqs = sc.accumulator(0L) * instrumentation.source.registerDailyAccumulator(numReqs, numReqs) * instrumentation.register() * * Will create and report the following metrics: * - Gauge with total number of requests (daily) * - Meter with rate of requests * * @param prefix prefix for all metrics that will be reported by this Instrumentation */ https://gist.github.com/ibuenros/9b94736c2bad2f4b8e23 ᐧ On Mon, Jan 5, 2015 at 2:56 PM, Enno Shioji eshi...@gmail.com wrote: Hi Gerard, Thanks for the answer! I had a good look at it, but I couldn't figure out whether one can use that to emit metrics from your application code. Suppose I wanted to monitor the rate of bytes I produce, like so: stream .map { input = val bytes = produce(input) // metricRegistry.meter(some.metrics).mark(bytes.length) bytes } .saveAsTextFile(text) Is there a way to achieve this with the MetricSystem? ᐧ On Mon, Jan 5, 2015 at 10:24 AM, Gerard Maas gerard.m...@gmail.com wrote: Hi, Yes, I managed to create a register custom metrics by creating an implementation of org.apache.spark.metrics.source.Source and registering it to the metrics subsystem. Source is [Spark] private, so you need to create it under a org.apache.spark package. In my case, I'm dealing with Spark Streaming metrics, and I created my CustomStreamingSource under org.apache.spark.streaming as I also needed access to some [Streaming] private components. Then, you register your new metric Source on the Spark's metric system, like so: SparkEnv.get.metricsSystem.registerSource(customStreamingSource) And it will get reported to the metrics Sync active on your system. By default, you can access them through the metric endpoint: http://driver-host:ui-port/metrics/json I hope this helps. -kr, Gerard. On Tue, Dec 30, 2014 at 3:32 PM, eshioji eshi...@gmail.com wrote: Hi, Did you find a way to do this / working on this? Am trying to find a way to do this as well, but haven't been able to find a way. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Registering-custom-metrics-tp9030p9968.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spark on teradata?
Depending on your use cases. If the use case is to extract small amount of data out of teradata, then you can use the JdbcRDD and soon a jdbc input source based on the new Spark SQL external data source API. On Wed, Jan 7, 2015 at 7:14 AM, gen tang gen.tan...@gmail.com wrote: Hi, I have a stupid question: Is it possible to use spark on Teradata data warehouse, please? I read some news on internet which say yes. However, I didn't find any example about this issue Thanks in advance. Cheers Gen
Re: K-Means And Class Tags
Thanks for the suggestion, can anyone offer any advice on the ClassCast Exception going from Java to Scala? Why does JavaRDD.rdd() and then a collect() result in this exception? On Thu, Jan 8, 2015 at 4:13 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: How about data.map(s=s.split(,)).filter(_.length1).map(good_entry=Vectors.dense((Double.parseDouble(good_entry[0]), Double.parseDouble(good_entry[1])) (full disclosure, I didn't actually run this). But after the first map you should have an RDD[Array[String]], then you'd discard everything shorter than 2, and convert the rest to dense vectors?...In fact if you're expecting length exactly 2 might want to filter ==2... On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel devl.developm...@gmail.com wrote: Hi All, I'm trying a simple K-Means example as per the website: val parsedData = data.map(s = Vectors.dense(s.split(',').map(_.toDouble))) but I'm trying to write a Java based validation method first so that missing values are omitted or replaced with 0. public RDDVector prepareKMeans(JavaRDDString data) { JavaRDDVector words = data.flatMap(new FlatMapFunctionString, Vector() { public IterableVector call(String s) { String[] split = s.split(,); ArrayListVector add = new ArrayListVector(); if (split.length != 2) { add.add(Vectors.dense(0, 0)); } else { add.add(Vectors.dense(Double.parseDouble(split[0]), Double.parseDouble(split[1]))); } return add; } }); return words.rdd(); } When I then call from scala: val parsedData=dc.prepareKMeans(data); val p=parsedData.collect(); I get Exception in thread main java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector; Why is the class tag is object rather than vector? 1) How do I get this working correctly using the Java validation example above or 2) How can I modify val parsedData = data.map(s = Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size 2 I ignore the line? or 3) Is there a better way to do input validation first? Using spark and mlib: libraryDependencies += org.apache.spark % spark-core_2.10 % 1.2.0 libraryDependencies += org.apache.spark % spark-mllib_2.10 % 1.2.0 Many thanks in advance Dev
Re: K-Means And Class Tags
Thanks for the suggestion, can anyone offer any advice on the ClassCast Exception going from Java to Scala? Why does going from JavaRDD.rdd() and then a collect() result in this exception? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-tp10038p10047.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org