Re: Cosine Similarity between documents - Rows

2017-11-27 Thread Ge, Yao (Y.)
You are essential doing document clustering. K-means will do it. You do have to specify the number of clusters up front. Sent from Email+ secured by MobileIron From: "Donni Khan" > Date:

Unsubscribe

2016-06-25 Thread Y!-RK
- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

RE: Spark scala REPL - Unable to create sqlContext

2015-10-25 Thread Ge, Yao (Y.)
Thanks. I wonder why this is not widely reported in the user forum. The RELP shell is basically broken in 1.5 .0 and 1.5.1 -Yao From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, October 25, 2015 12:01 PM To: Ge, Yao (Y.) Cc: user Subject: Re: Spark scala REPL - Unable to create sqlContext

[SPARK-9776]Another instance of Derby may have already booted the database #8947

2015-10-23 Thread Ge, Yao (Y.)
I have not been able to run spark-shell in yarn-cluster mode since 1.5.0 due to the same issue described by [SPARK-9776]. Did this pull request fix the issue? https://github.com/apache/spark/pull/8947 I still have the same problem with 1.5.1 (I am running on HDP 2.2.6 with Hadoop 2.6) Thanks.

How to restart a failed Spark Streaming Application automatically in client mode on YARN

2015-10-22 Thread y
I'm managing Spark Streaming applications which run on Cloud Dataproc (https://cloud.google.com/dataproc/). Spark Streaming applications running on a Cloud Dataproc cluster seem to run in client mode on YARN. Some of my applications sometimes stop due to the application failure. I'd like YARN to

Cannot get spark-streaming_2.10-1.5.0.pom from the maven repository

2015-10-12 Thread y
When I access the following URL, I often get a 404 error and I cannot get the POM file of "spark-streaming_2.10-1.5.0.pom". http://central.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom Are there any problems inside the maven repository? Are there any

Re: How to avoid being killed by YARN node manager ?

2015-03-30 Thread Y. Sakamoto
Thank you for your reply. I'm sorry confirmation is slow. I'll try the tuning 'spark.yarn.executor.memoryOverhead'. Thanks, Yuichiro Sakamoto On 2015/03/25 0:56, Sandy Ryza wrote: Hi Yuichiro, The way to avoid this is to boost spark.yarn.executor.memoryOverhead until the executors have

RE: Connecting Cassandra by unknow host

2015-02-06 Thread Sun, Vincent Y
, January 29, 2015 8:02 PM To: Sun, Vincent Y Cc: user@spark.apache.org Subject: Re: Connecting Cassandra by unknow host Hi, I am no expert but have a small application working with Spark and Cassandra. I faced these issues when we were deploying our cluster on EC2 instances with some machines on public

RE: get null potiner exception newAPIHadoopRDD.map()

2015-02-06 Thread Sun, Vincent Y
Thanks. The data is there, I have checked the row count and dump to file. -Vincent From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Thursday, February 05, 2015 2:28 PM To: Sun, Vincent Y Cc: user Subject: Re: get null potiner exception newAPIHadoopRDD.map() Is it possible that value.get

Decision Tree with libsvmtools datasets

2014-12-10 Thread Ge, Yao (Y.)
I am testing decision tree using iris.scale data set (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#iris) In the data set there are three class labels 1, 2, and 3. However in the following code, I have to make numClasses = 4. I will get an ArrayIndexOutOfBound Exception

Decision Tree with Categorical Features

2014-12-10 Thread Ge, Yao (Y.)
Can anyone provide an example code of using Categorical Features in Decision Tree? Thanks! -Yao

scala.MatchError: class java.sql.Timestamp

2014-10-19 Thread Ge, Yao (Y.)
I am working with Spark 1.1.0 and I believe Timestamp is a supported data type for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp when I try to use reflection to register a Java Bean with Timestamp field. Anything wrong with my code below? public

RE: scala.MatchError: class java.sql.Timestamp

2014-10-19 Thread Ge, Yao (Y.)
(RemoteTestRunner.java:197) From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com] Sent: Sunday, October 19, 2014 10:31 AM To: Ge, Yao (Y.); user@spark.apache.org Subject: RE: scala.MatchError: class java.sql.Timestamp Can you provide the exception stack? Thanks, Daoyuan From: Ge, Yao (Y.) [mailto:y...@ford.com

Exception Logging

2014-10-16 Thread Ge, Yao (Y.)
I need help to better trap Exception in the map functions. What is the best way to catch the exception and provide some helpful diagnostic information such as source of the input such as file name (and ideally line number if I am processing a text file)? -Yao

RE: Dedup

2014-10-09 Thread Ge, Yao (Y.)
much Sean! -Yao -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Thursday, October 09, 2014 3:04 AM To: Ge, Yao (Y.) Cc: user@spark.apache.org Subject: Re: Dedup I think the question is about copying the argument. If it's an immutable value like String, yes just

Dedup

2014-10-08 Thread Ge, Yao (Y.)
I need to do deduplication processing in Spark. The current plan is to generate a tuple where key is the dedup criteria and value is the original input. I am thinking to use reduceByKey to discard duplicate values. If I do that, can I simply return the first argument or should I return a copy

HBase and non-existent TableInputFormat

2014-09-16 Thread Y. Dong
Hello, I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply read from hbase. The Java code is attached. However the problem is TableInputFormat does not even exist in hbase-client API, is there any other way I can read from hbase? Thanks SparkConf sconf = new

RE: KMeans - java.lang.IllegalArgumentException: requirement failed

2014-08-12 Thread Ge, Yao (Y.)
array will need to be in ascending order. In many cases, it probably easier to use other two forms of Vectors.sparse functions if the indices and value positions are not naturally sorted. -Yao From: Ge, Yao (Y.) Sent: Monday, August 11, 2014 11:44 PM To: 'u...@spark.incubator.apache.org

KMeans - java.lang.IllegalArgumentException: requirement failed

2014-08-11 Thread Ge, Yao (Y.)
I am trying to train a KMeans model with sparse vector with Spark 1.0.1. When I run the training I got the following exception: java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at