from:"VISHNU SUBRAMANIAN"

Printing MLpipeline model in Python.

2016-03-14 Thread VISHNU SUBRAMANIAN

HI All, I am using Spark 1.6 and Pyspark. I am trying to build a Randomforest classifier model using mlpipeline and in python. When I am trying to print the model I get the below value. RandomForestClassificationModel (uid=rfc_be9d4f681b92) with 10 trees When I use MLLIB RandomForest model

Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread VISHNU SUBRAMANIAN

HI , If you need a data frame specific solution , you can try the below df.select(from_unixtime(col("max(utcTimestamp)")/1000)) On Tue, 2 Feb 2016 at 09:44 Ted Yu wrote: > See related thread on using Joda DateTime: > http://search-hadoop.com/m/q3RTtSfi342nveex1=RE+NPE+ >

Re: How to accelerate reading json file?

2016-01-05 Thread VISHNU SUBRAMANIAN

HI , You can try this sqlContext.read.format("json").option("samplingRatio","0.1").load("path") If it still takes time , feel free to experiment with the samplingRatio. Thanks, Vishnu On Wed, Jan 6, 2016 at 12:43 PM, Gavin Yue wrote: > I am trying to read json files

Re: custom schema in spark throwing error

2015-12-21 Thread VISHNU SUBRAMANIAN

Try this val customSchema = StructType(Array( StructField("year", IntegerType, true), StructField("make", StringType, true), StructField("model", StringType, true) )) On Mon, Dec 21, 2015 at 8:26 AM, Divya Gehlot wrote: > >1. scala> import

How VectorIndexer works in Spark ML pipelines

2015-10-15 Thread VISHNU SUBRAMANIAN

HI All, I am trying to use the VectorIndexer (FeatureExtraction) technique available from the Spark ML Pipelines. I ran the example in the documentation . val featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol("indexedFeatures") .setMaxCategories(4) .fit(data)

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN

by my query.I need to run the mentioned block again to use the UDF. Is there is any way to maintain UDF in sqlContext permanently? Thanks, Vinod On Wed, Jul 8, 2015 at 7:16 AM, VISHNU SUBRAMANIAN johnfedrickena...@gmail.com wrote: Hi, sqlContext.udf.register(udfname, functionname

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN

Hi, sqlContext.udf.register(udfname, functionname _) example: def square(x:Int):Int = { x * x} register udf as below sqlContext.udf.register(square,square _) Thanks, Vishnu On Wed, Jul 8, 2015 at 2:23 PM, vinod kumar vinodsachin...@gmail.com wrote: Hi Everyone, I am new to spark.may I

Re: used cores are less then total no. of core

2015-02-24 Thread VISHNU SUBRAMANIAN

Try adding --total-executor-cores 5 , where 5 is the number of cores. Thanks, Vishnu On Wed, Feb 25, 2015 at 11:52 AM, Somnath Pandeya somnath_pand...@infosys.com wrote: Hi All, I am running a simple word count example of spark (standalone cluster) , In the UI it is showing For each

Re: Running Example Spark Program

2015-02-22 Thread VISHNU SUBRAMANIAN

Try restarting your Spark cluster . ./sbin/stop-all.sh ./sbin/start-all.sh Thanks, Vishnu On Sun, Feb 22, 2015 at 7:30 PM, Surendran Duraisamy 2013ht12...@wilp.bits-pilani.ac.in wrote: Hello All, I am new to Apache Spark, I am trying to run JavaKMeans.java from Spark Examples in my Ubuntu

Re: Hive/Hbase for low latency

2015-02-11 Thread VISHNU SUBRAMANIAN

Hi Siddarth, It depends on what you are trying to solve. But the connectivity for cassandra and spark is good . The answer depends upon what exactly you are trying to solve. Thanks, Vishnu On Wed, Feb 11, 2015 at 7:47 PM, Siddharth Ubale siddharth.ub...@syncoms.com wrote: Hi , I am new

Re: Question related to Spark SQL

2015-02-11 Thread VISHNU SUBRAMANIAN

in HiveQL.Row[] results = sqlContext.sql(sqlClause).collect(); Is my understanding right? Regards, Ashish On Wed, Feb 11, 2015 at 4:42 PM, VISHNU SUBRAMANIAN johnfedrickena...@gmail.com wrote: Hi Ashish, In order to answer your question , I assume that you are planning to process

Re: Re: How can I read this avro file using spark scala?

2015-02-11 Thread VISHNU SUBRAMANIAN

Check this link. https://github.com/databricks/spark-avro Home page for Spark-avro project. Thanks, Vishnu On Wed, Feb 11, 2015 at 10:19 PM, Todd bit1...@163.com wrote: Databricks provides a sample code on its website...but i can't find it for now. At 2015-02-12 00:43:07, captainfranz

Re: getting the cluster elements from kmeans run

2015-02-11 Thread VISHNU SUBRAMANIAN

You can use model.predict(point) that will help you identify the cluster center and map it to the point. rdd.map(x = (x,model.predict(x))) Thanks, Vishnu On Wed, Feb 11, 2015 at 11:06 PM, Harini Srinivasan har...@us.ibm.com wrote: Hi, Is there a way to get the elements of each cluster after

Re: NaiveBayes classifier causes ShuffleDependency class cast exception

2015-02-06 Thread VISHNU SUBRAMANIAN

Can you try creating just a single spark context and then try your code. If you want to use it for streaming pass the same sparkcontext object instead of conf. Note: Instead of just replying to me , try to use reply to all so that the post is visible for the community . That way you can expect

Re: Shuffle Dependency Casting error

2015-02-05 Thread VISHNU SUBRAMANIAN

Hi, Could you share the code snippet. Thanks, Vishnu On Thu, Feb 5, 2015 at 11:22 PM, aanilpala aanilp...@gmail.com wrote: Hi, I am working on a text mining project and I want to use NaiveBayesClassifier of MLlib to classify some stream items. So, I have two Spark contexts one of which is a

Re: Java Kafka Word Count Issue

2015-02-02 Thread VISHNU SUBRAMANIAN

You can use updateStateByKey() to perform the above operation. On Mon, Feb 2, 2015 at 4:29 PM, Jadhav Shweta jadhav.shw...@tcs.com wrote: Hi Sean, Kafka Producer is working fine. This is related to Spark. How can i configure spark so that it will make sure to remember count from the

Re: Failed to save RDD as text file to local file system

2015-01-08 Thread VISHNU SUBRAMANIAN

looks like it is trying to save the file in Hdfs. Check if you have set any hadoop path in your system. On Fri, Jan 9, 2015 at 12:14 PM, Raghavendra Pandey raghavendra.pan...@gmail.com wrote: Can you check permissions etc as I am able to run r.saveAsTextFile(file:///home/cloudera/tmp/out1)

Printing MLpipeline model in Python.

Re: how to covert millisecond time to SQL timeStamp

Re: How to accelerate reading json file?

Re: custom schema in spark throwing error

How VectorIndexer works in Spark ML pipelines

Re: UDF in spark

Re: UDF in spark

Re: used cores are less then total no. of core

Re: Running Example Spark Program

Re: Hive/Hbase for low latency

Re: Question related to Spark SQL

Re: Re: How can I read this avro file using spark scala?

Re: getting the cluster elements from kmeans run

Re: NaiveBayes classifier causes ShuffleDependency class cast exception

Re: Shuffle Dependency Casting error

Re: Java Kafka Word Count Issue

Re: Failed to save RDD as text file to local file system

17 matches

Site Navigation

Mail list logo

Footer information