Printing MLpipeline model in Python.

2016-03-14 Thread VISHNU SUBRAMANIAN
HI All, I am using Spark 1.6 and Pyspark. I am trying to build a Randomforest classifier model using mlpipeline and in python. When I am trying to print the model I get the below value. RandomForestClassificationModel (uid=rfc_be9d4f681b92) with 10 trees When I use MLLIB RandomForest model

Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread VISHNU SUBRAMANIAN
HI , If you need a data frame specific solution , you can try the below df.select(from_unixtime(col("max(utcTimestamp)")/1000)) On Tue, 2 Feb 2016 at 09:44 Ted Yu wrote: > See related thread on using Joda DateTime: > http://search-hadoop.com/m/q3RTtSfi342nveex1=RE+NPE+ >

Re: How to accelerate reading json file?

2016-01-05 Thread VISHNU SUBRAMANIAN
HI , You can try this sqlContext.read.format("json").option("samplingRatio","0.1").load("path") If it still takes time , feel free to experiment with the samplingRatio. Thanks, Vishnu On Wed, Jan 6, 2016 at 12:43 PM, Gavin Yue wrote: > I am trying to read json files

Re: custom schema in spark throwing error

2015-12-21 Thread VISHNU SUBRAMANIAN
Try this val customSchema = StructType(Array( StructField("year", IntegerType, true), StructField("make", StringType, true), StructField("model", StringType, true) )) On Mon, Dec 21, 2015 at 8:26 AM, Divya Gehlot wrote: > >1. scala> import

How VectorIndexer works in Spark ML pipelines

2015-10-15 Thread VISHNU SUBRAMANIAN
HI All, I am trying to use the VectorIndexer (FeatureExtraction) technique available from the Spark ML Pipelines. I ran the example in the documentation . val featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol("indexedFeatures") .setMaxCategories(4) .fit(data)

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN
by my query.I need to run the mentioned block again to use the UDF. Is there is any way to maintain UDF in sqlContext permanently? Thanks, Vinod On Wed, Jul 8, 2015 at 7:16 AM, VISHNU SUBRAMANIAN johnfedrickena...@gmail.com wrote: Hi, sqlContext.udf.register(udfname, functionname

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN
Hi, sqlContext.udf.register(udfname, functionname _) example: def square(x:Int):Int = { x * x} register udf as below sqlContext.udf.register(square,square _) Thanks, Vishnu On Wed, Jul 8, 2015 at 2:23 PM, vinod kumar vinodsachin...@gmail.com wrote: Hi Everyone, I am new to spark.may I

Re: used cores are less then total no. of core

2015-02-24 Thread VISHNU SUBRAMANIAN
Try adding --total-executor-cores 5 , where 5 is the number of cores. Thanks, Vishnu On Wed, Feb 25, 2015 at 11:52 AM, Somnath Pandeya somnath_pand...@infosys.com wrote: Hi All, I am running a simple word count example of spark (standalone cluster) , In the UI it is showing For each

Re: Running Example Spark Program

2015-02-22 Thread VISHNU SUBRAMANIAN
Try restarting your Spark cluster . ./sbin/stop-all.sh ./sbin/start-all.sh Thanks, Vishnu On Sun, Feb 22, 2015 at 7:30 PM, Surendran Duraisamy 2013ht12...@wilp.bits-pilani.ac.in wrote: Hello All, I am new to Apache Spark, I am trying to run JavaKMeans.java from Spark Examples in my Ubuntu

Re: Hive/Hbase for low latency

2015-02-11 Thread VISHNU SUBRAMANIAN
Hi Siddarth, It depends on what you are trying to solve. But the connectivity for cassandra and spark is good . The answer depends upon what exactly you are trying to solve. Thanks, Vishnu On Wed, Feb 11, 2015 at 7:47 PM, Siddharth Ubale siddharth.ub...@syncoms.com wrote: Hi , I am new

Re: Question related to Spark SQL

2015-02-11 Thread VISHNU SUBRAMANIAN
in HiveQL.Row[] results = sqlContext.sql(sqlClause).collect(); Is my understanding right? Regards, Ashish On Wed, Feb 11, 2015 at 4:42 PM, VISHNU SUBRAMANIAN johnfedrickena...@gmail.com wrote: Hi Ashish, In order to answer your question , I assume that you are planning to process

Re: Re: How can I read this avro file using spark scala?

2015-02-11 Thread VISHNU SUBRAMANIAN
Check this link. https://github.com/databricks/spark-avro Home page for Spark-avro project. Thanks, Vishnu On Wed, Feb 11, 2015 at 10:19 PM, Todd bit1...@163.com wrote: Databricks provides a sample code on its website...but i can't find it for now. At 2015-02-12 00:43:07, captainfranz

Re: getting the cluster elements from kmeans run

2015-02-11 Thread VISHNU SUBRAMANIAN
You can use model.predict(point) that will help you identify the cluster center and map it to the point. rdd.map(x = (x,model.predict(x))) Thanks, Vishnu On Wed, Feb 11, 2015 at 11:06 PM, Harini Srinivasan har...@us.ibm.com wrote: Hi, Is there a way to get the elements of each cluster after

Re: NaiveBayes classifier causes ShuffleDependency class cast exception

2015-02-06 Thread VISHNU SUBRAMANIAN
Can you try creating just a single spark context and then try your code. If you want to use it for streaming pass the same sparkcontext object instead of conf. Note: Instead of just replying to me , try to use reply to all so that the post is visible for the community . That way you can expect

Re: Shuffle Dependency Casting error

2015-02-05 Thread VISHNU SUBRAMANIAN
Hi, Could you share the code snippet. Thanks, Vishnu On Thu, Feb 5, 2015 at 11:22 PM, aanilpala aanilp...@gmail.com wrote: Hi, I am working on a text mining project and I want to use NaiveBayesClassifier of MLlib to classify some stream items. So, I have two Spark contexts one of which is a

Re: Java Kafka Word Count Issue

2015-02-02 Thread VISHNU SUBRAMANIAN
You can use updateStateByKey() to perform the above operation. On Mon, Feb 2, 2015 at 4:29 PM, Jadhav Shweta jadhav.shw...@tcs.com wrote: Hi Sean, Kafka Producer is working fine. This is related to Spark. How can i configure spark so that it will make sure to remember count from the

Re: Failed to save RDD as text file to local file system

2015-01-08 Thread VISHNU SUBRAMANIAN
looks like it is trying to save the file in Hdfs. Check if you have set any hadoop path in your system. On Fri, Jan 9, 2015 at 12:14 PM, Raghavendra Pandey raghavendra.pan...@gmail.com wrote: Can you check permissions etc as I am able to run r.saveAsTextFile(file:///home/cloudera/tmp/out1)