Re: Linear regression + Janino Exception

2016-11-20 Thread janardhan shetty
Seems like this is associated to : https://issues.apache.org/jira/browse/SPARK-16845 On Sun, Nov 20, 2016 at 6:09 PM, janardhan shetty <janardhan...@gmail.com> wrote: > Hi, > > I am trying to execute Linear regression algorithm for Spark 2.02 and > hitting the below error wh

Linear regression + Janino Exception

2016-11-20 Thread janardhan shetty
Hi, I am trying to execute Linear regression algorithm for Spark 2.02 and hitting the below error when I am fitting my training set: val lrModel = lr.fit(train) It happened on 2.0.0 as well. Any resolution steps is appreciated. *Error Snippet: * 16/11/20 18:03:45 *ERROR CodeGenerator: failed

Re: Usage of mllib api in ml

2016-11-20 Thread janardhan shetty
xample-model-selection-via-cross-validation) >> which use BinaryClassificationEvaluator, and it should be very >> straightforward to switch to MulticlassClassificationEvaluator. >> >> Thanks >> Yanbo >> >> On Sat, Nov 19, 2016 at 9:03 AM, janardhan shetty <j

Usage of mllib api in ml

2016-11-19 Thread janardhan shetty
Hi, I am trying to use the evaluation metrics offered by mllib multiclassmetrics in ml dataframe setting. Is there any examples how to use it?

Re: Log-loss for multiclass classification

2016-11-16 Thread janardhan shetty
I am sure some work might be in pipeline as it is a normal evaluation criteria. Any thoughts or links ? On Nov 15, 2016 11:15 AM, "janardhan shetty" <janardhan...@gmail.com> wrote: > Hi, > > Best practice for multi class classification technique is to evaluate

Log-loss for multiclass classification

2016-11-15 Thread janardhan shetty
Hi, Best practice for multi class classification technique is to evaluate the model by *log-loss*. Is there any jira or work going on to implement the same in *MulticlassClassificationEvaluator* Currently it supports following : (supports "f1" (default), "weightedPrecision", "weightedRecall",

Re: Convert SparseVector column to Densevector column

2016-11-14 Thread janardhan shetty
3), Array(0.1, 0.3))), > (0.2, Vectors.sparse(16, Array(0, 3), Array(0.1, 0.3.toDF("a", "b") > df.select(toSV($"b")) > > // maropu > > > On Mon, Nov 14, 2016 at 1:20 PM, janardhan shetty <janardhan...@gmail.com> > wrote: > >> H

Convert SparseVector column to Densevector column

2016-11-13 Thread janardhan shetty
Hi, Is there any easy way of converting a dataframe column from SparseVector to DenseVector using import org.apache.spark.ml.linalg.DenseVector API ? Spark ML 2.0

Re: Spark ML : One hot Encoding for multiple columns

2016-11-13 Thread janardhan shetty
times for the rest of the columns. > > > > > On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty <janardhan...@gmail.com > > wrote: > >> I had already tried this way : >> >> scala> val featureCols = Array("category","newone") >> featureC

Re: Deep learning libraries for scala

2016-10-19 Thread janardhan shetty
a lot to think about using the language as a > tool to access algorithms in this instance unless you want to start > developing algorithms from grounds up ( and in which case you might not > require any libraries at all). > > On Sat, Oct 1, 2016 at 3:30 AM, janardhan shetty <janardhan...@gmail

Re: Deep learning libraries for scala

2016-10-01 Thread janardhan shetty
< suresh.thalam...@gmail.com> wrote: > Tensor frames > > https://spark-packages.org/package/databricks/tensorframes > > Hope that helps > -suresh > > On Sep 30, 2016, at 8:00 PM, janardhan shetty <janardhan...@gmail.com> > wrote: > > Looking for scala

Re: Deep learning libraries for scala

2016-09-30 Thread janardhan shetty
Looking for scala dataframes in particular ? On Fri, Sep 30, 2016 at 7:46 PM, Gavin Yue <yue.yuany...@gmail.com> wrote: > Skymind you could try. It is java > > I never test though. > > > On Sep 30, 2016, at 7:30 PM, janardhan shetty <janardhan...@gma

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread janardhan shetty
and various methods of constructing and pruning them for over 30 > years. I think it's rather a question for a historian at this point. > > On Fri, Sep 30, 2016 at 5:08 PM, janardhan shetty <janardhan...@gmail.com> > wrote: > >> Read this explanation but wonder

Deep learning libraries for scala

2016-09-30 Thread janardhan shetty
Hi, Are there any good libraries which can be used for scala deep learning models ? How can we integrate tensorflow with scala ML ?

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread janardhan shetty
latest/mllib-decision-tree.html > > Thanks, > Kevin > > On Fri, Sep 30, 2016 at 1:14 AM, janardhan shetty <janardhan...@gmail.com> > wrote: > >> Hi, >> >> Any help here is appreciated .. >> >> On Wed, Sep 28, 2016 at 11:34 AM, janardhan s

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread janardhan shetty
Hi, Any help here is appreciated .. On Wed, Sep 28, 2016 at 11:34 AM, janardhan shetty <janardhan...@gmail.com> wrote: > Is there a reference to the research paper which is implemented in spark > 2.0 ? > > On Wed, Sep 28, 2016 at 9:52 AM, janardhan shetty <janardhan.

Re: Spark ML Decision Trees Algorithm

2016-09-28 Thread janardhan shetty
Is there a reference to the research paper which is implemented in spark 2.0 ? On Wed, Sep 28, 2016 at 9:52 AM, janardhan shetty <janardhan...@gmail.com> wrote: > Which algorithm is used under the covers while doing decision trees FOR > SPARK ? > for example: scikit-lear

Spark ML Decision Trees Algorithm

2016-09-28 Thread janardhan shetty
Which algorithm is used under the covers while doing decision trees FOR SPARK ? for example: scikit-learn (python) uses an optimised version of the CART algorithm.

Re: SPARK-10835 in 2.0

2016-09-20 Thread janardhan shetty
Thanks Sean. On Sep 20, 2016 7:45 AM, "Sean Owen" <so...@cloudera.com> wrote: > Ah, I think that this was supposed to be changed with SPARK-9062. Let > me see about reopening 10835 and addressing it. > > On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty >

Re: SPARK-10835 in 2.0

2016-09-20 Thread janardhan shetty
Is this a bug? On Sep 19, 2016 10:10 PM, "janardhan shetty" <janardhan...@gmail.com> wrote: > Hi, > > I am hitting this issue. https://issues.apache.org/jira/browse/SPARK-10835 > . > > Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is > a

SPARK-10835 in 2.0

2016-09-19 Thread janardhan shetty
Hi, I am hitting this issue. https://issues.apache.org/jira/browse/SPARK-10835. Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is appreciated ? Note: Pipeline has Ngram before word2Vec. Error: val word2Vec = new

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-19 Thread janardhan shetty
lp" % "3.6.0", > "com.google.protobuf" % "protobuf-java" % "2.6.1", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models", > "org.scalatest" %% "scalatest" % "2.2.6&q

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Sep 18, 2016 at 2:21 PM, Sujit Pal <sujitatgt...@gmail.com> wrote: > Hi Janardhan, > > Maybe try removing the string "test" from this line in your build.sbt? > IIRC, this restricts the models JAR to be called from a test. > > "edu.stanford.nlp"

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
glish-left3words-distsim.tagger" as class path, filename or URL at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:485) at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:765) On Sun, Sep 18, 2016 at 12:27 PM, janard

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Using: spark-shell --packages databricks:spark-corenlp:0.2.0-s_2.11 On Sun, Sep 18, 2016 at 12:26 PM, janardhan shetty <janardhan...@gmail.com> wrote: > Hi Jacek, > > Thanks for your response. This is the code I am trying to execute > > import org.apache.spark.sql

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
m.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty > <janardhan...@gmail.com> wrote: > > Hi, > > > > I am trying to use lemm

Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Hi, I am trying to use lemmatization as a transformer and added belwo to the build.sbt "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", "com.google.protobuf" % "protobuf-java" % "2.6.1", "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier "models",

Re: LDA spark ML visualization

2016-09-13 Thread janardhan shetty
Any help is appreciated to proceed in this problem. On Sep 12, 2016 11:45 AM, "janardhan shetty" <janardhan...@gmail.com> wrote: > Hi, > > I am trying to visualize the LDA model developed in spark scala (2.0 ML) > in LDAvis. > > Is there any links to

LDA spark ML visualization

2016-09-12 Thread janardhan shetty
Hi, I am trying to visualize the LDA model developed in spark scala (2.0 ML) in LDAvis. Is there any links to convert the spark model parameters to the following 5 params to visualize ? 1. φ, the K × W matrix containing the estimated probability mass function over the W terms in the vocabulary

Re: Spark transformations

2016-09-12 Thread janardhan shetty
ar no great solution. > > Sorry I don't have any answers, but wanted to chime in that I am also a > bit stuck on similar issues. Hope we can find a workable solution soon. > Cheers, > Thunder > > > > On Tue, Sep 6, 2016 at 1:32 PM janardhan shetty <janardhan...@gmail.com&

Re: Using spark package XGBoost

2016-09-08 Thread janardhan shetty
Tried to implement spark package in 2.0 https://spark-packages.org/package/rotationsymmetry/sparkxgboost but it is throwing the error: error: not found: type SparkXGBoostClassifier On Tue, Sep 6, 2016 at 11:26 AM, janardhan shetty <janardhan...@gmail.com> wrote: > Is this merged to

Difference between UDF and Transformer in Spark ML

2016-09-06 Thread janardhan shetty
Apart from creation of a new column what are the other differences between transformer and an udf in spark ML ?

Re: Spark ML 2.1.0 new features

2016-09-06 Thread janardhan shetty
Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Tue, Sep 6, 2016 at 10:27 PM, janardhan shetty > <janardhan...@gmail.com> wrote: >

Re: Spark transformations

2016-09-06 Thread janardhan shetty
orward checking* how can we get this information ? We have visibility into single element and not the entire column. On Sun, Sep 4, 2016 at 9:30 AM, janardhan shetty <janardhan...@gmail.com> wrote: > In scala Spark ML Dataframes. > > On Sun, Sep 4, 2016 at 9:16 AM, Somasundaram Se

Re: Spark ML 2.1.0 new features

2016-09-06 Thread janardhan shetty
Any links ? On Mon, Sep 5, 2016 at 1:50 PM, janardhan shetty <janardhan...@gmail.com> wrote: > Is there any documentation or links on the new features which we can > expect for Spark ML 2.1.0 release ? >

Re: Using spark package XGBoost

2016-09-06 Thread janardhan shetty
t; Pozdrawiam, >>> Jacek Laskowski >>> >>> https://medium.com/@jaceklaskowski/ >>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>> Follow me at https://twitter.com/jaceklaskowski >>> >>> >>> On Sun, Aug 14, 20

Spark ML 2.1.0 new features

2016-09-05 Thread janardhan shetty
Is there any documentation or links on the new features which we can expect for Spark ML 2.1.0 release ?

Re: Spark transformations

2016-09-04 Thread janardhan shetty
In scala Spark ML Dataframes. On Sun, Sep 4, 2016 at 9:16 AM, Somasundaram Sekar < somasundar.se...@tigeranalytics.com> wrote: > Can you try this > > https://www.linkedin.com/pulse/hive-functions-udfudaf- > udtf-examples-gaurav-singh > > On 4 Sep 2016 9:38 pm, "

Spark transformations

2016-09-04 Thread janardhan shetty
Hi, Is there any chance that we can send entire multiple columns to an udf and generate a new column for Spark ML. I see similar approach as VectorAssembler but not able to use few classes /traitslike HasInputCols, HasOutputCol, DefaultParamsWritable since they are private. Any leads/examples is

Re: Combining multiple models in Spark-ML 2.0

2016-08-23 Thread janardhan shetty
Any methods to achieve this? On Aug 22, 2016 3:40 PM, "janardhan shetty" <janardhan...@gmail.com> wrote: > Hi, > > Are there any pointers, links on stacking multiple models in spark > dataframes ?. WHat strategies can be employed if we need to combine greater > than 2 models ? >

Combining multiple models in Spark-ML 2.0

2016-08-22 Thread janardhan shetty
Hi, Are there any pointers, links on stacking multiple models in spark dataframes ?. WHat strategies can be employed if we need to combine greater than 2 models ?

Re: Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-22 Thread janardhan shetty
ists.apache.org/ > thread.html/a7e06426fd958665985d2c4218ea2f9bf9ba136ddefe83e1ad6f1727@% > 3Cuser.spark.apache.org%3E for some details). > > > > On Mon, 22 Aug 2016 at 03:20 janardhan shetty <janardhan...@gmail.com> > wrote: > >> Thanks Krishna for your response.

Re: Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-21 Thread janardhan shetty
, should be 15,909 >> - If you expect it to be 29,471, then the X Matrix is not right. >> 2. It is also probable that the size of the test-data is something >>else. If so, check the data pipeline. >>3. If you print the count() of the various vectors, I thin

Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-21 Thread janardhan shetty
Hi, I have built the logistic regression model using training-dataset. When I am predicting on a test-dataset, it is throwing the below error of size mismatch. Steps done: 1. String indexers on categorical features. 2. One hot encoding on these indexed features. Any help is appreciated to

Re: SPARK MLLib - How to tie back Model.predict output to original data?

2016-08-18 Thread janardhan shetty
There is a spark-ts package developed by Sandy which has rdd version. Not sure about the dataframe roadmap. http://sryza.github.io/spark-timeseries/0.3.0/index.html On Aug 18, 2016 12:42 AM, "ayan guha" wrote: > Thanks a lot. I resolved it using an UDF. > > Qs: does spark

Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread janardhan shetty
ation: > https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder, > I see that it still accepts one column at a time. > > On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty <janardhan...@gmail.com > > wrote: > >> 2.0: >> >> One hot encoding currently accepts single input column is there a way to >> include multiple columns ? >> > >

Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread janardhan shetty
2.0: One hot encoding currently accepts single input column is there a way to include multiple columns ?

Re: Using spark package XGBoost

2016-08-14 Thread janardhan shetty
Any leads how to do acheive this? On Aug 12, 2016 6:33 PM, "janardhan shetty" <janardhan...@gmail.com> wrote: > I tried using *sparkxgboost package *in build.sbt file but it failed. > Spark 2.0 > Scala 2.11.8 > > Error: > [warn] http://dl.bintray.com/spark-

Re: Using spark package XGBoost

2016-08-12 Thread janardhan shetty
; => MergeStrategy.first case "application.conf" => MergeStrategy.concat case "unwanted.txt"=> MergeStrategy.discard case x => val oldStrategy = (assemblyMergeStrategy in assembly).value oldStrategy(x) } On Fri, Aug 12, 2016 at 3:35 PM, janardhan shetty <janardhan...@gmail.com> wrote: > Is there a dataframe version of XGBoost in spark-ml ?. > Has anyone used sparkxgboost package ? >

Using spark package XGBoost

2016-08-12 Thread janardhan shetty
Is there a dataframe version of XGBoost in spark-ml ?. Has anyone used sparkxgboost package ?

Re: Symbol HasInputCol is inaccesible from this place

2016-08-08 Thread janardhan shetty
Can some experts shed light on this one? Still facing issues with extends HasInputCol and DefaultParamsWritable On Mon, Aug 8, 2016 at 9:56 AM, janardhan shetty <janardhan...@gmail.com> wrote: > you mean is it deprecated ? > > On Mon, Aug 8, 2016 at 5:02 AM, Strange, Nick <ni

Re: Symbol HasInputCol is inaccesible from this place

2016-08-08 Thread janardhan shetty
you mean is it deprecated ? On Mon, Aug 8, 2016 at 5:02 AM, Strange, Nick <nick.stra...@fmr.com> wrote: > What possible reason do they have to think its fragmentation? > > > > *From:* janardhan shetty [mailto:janardhan...@gmail.com] > *Sent:* Saturday, August 06, 201

Re: [Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread janardhan shetty
Can you try 'or' keyword instead? On Aug 7, 2016 7:43 AM, "Divya Gehlot" wrote: > Hi, > I have use case where I need to use or[||] operator in filter condition. > It seems its not working its taking the condition before the operator and > ignoring the other filter

Re: Symbol HasInputCol is inaccesible from this place

2016-08-06 Thread janardhan shetty
ms { > > On Thu, Aug 4, 2016 at 1:18 PM, janardhan shetty <janardhan...@gmail.com> > wrote: > >> Version : 2.0.0-preview >> >> import org.apache.spark.ml.param._ >> import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} >> >> >

Re: Symbol HasInputCol is inaccesible from this place

2016-08-06 Thread janardhan shetty
Any thoughts or suggestions on this error? On Thu, Aug 4, 2016 at 1:18 PM, janardhan shetty <janardhan...@gmail.com> wrote: > Version : 2.0.0-preview > > import org.apache.spark.ml.param._ > import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} > > &

Re: Generating unique id for a column in Row without breaking into RDD and joining back

2016-08-05 Thread janardhan shetty
Mike, Any suggestions on doing it for consequitive id's? On Aug 5, 2016 9:08 AM, "Tony Lane" wrote: > Mike. > > I have figured how to do this . Thanks for the suggestion. It works > great. I am trying to figure out the performance impact of this. > > thanks again > > >

Symbol HasInputCol is inaccesible from this place

2016-08-04 Thread janardhan shetty
Version : 2.0.0-preview import org.apache.spark.ml.param._ import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} class CustomTransformer(override val uid: String) extends Transformer with HasInputCol with HasOutputCol with DefaultParamsWritableimport

Re: decribe function limit of columns

2016-08-02 Thread janardhan shetty
If you are referring to limit the # of columns you can select the columns and describe. df.select("col1", "col2").describe().show() On Tue, Aug 2, 2016 at 6:39 AM, pseudo oduesp wrote: > Hi > in spark 1.5.0 i used descibe function with more than 100 columns . > someone

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-08-01 Thread janardhan shetty
What is the difference between UnaryTransformer and Transformer classes. In which scenarios should we use one or the other ? On Sun, Jul 31, 2016 at 8:27 PM, janardhan shetty <janardhan...@gmail.com> wrote: > Developing in scala but any help with difference between UnaryTr

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-07-31 Thread janardhan shetty
> > On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty > <janardhan...@gmail.com> wrote: > > Thanks Steve. > > > > Any pointers to custom estimators development as well ? > > > > On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sar...@gmail.com> w

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-07-28 Thread janardhan shetty
alysis component, here: < > https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>. > > -- > Steve > www.lucidworks.com > > > On Jul 27, 2016, at 1:31 PM, janardhan shetty <janardhan...@gmail.com> > wrote: > > > > 1. Any links or blogs to

Re: ORC v/s Parquet for Spark 2.0

2016-07-27 Thread janardhan shetty
>> >>>> I think both are very similar, but with slightly different goals. While >>>> they work transparently for each Hadoop application you need to enable >>>> specific support in the application for predicate push down. >>>> In the end you h

Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-07-27 Thread janardhan shetty
1. Any links or blogs to develop *custom* transformers ? ex: Tokenizer 2. Any links or blogs to develop *custom* estimators ? ex: any ml algorithm

Re: Maintaining order of pair rdd

2016-07-26 Thread janardhan shetty
e) > > then you can do this > val reduced = myRDD.reduceByKey((first, second) => first ++ second) > > val sorted = reduced.sortBy(tpl => tpl._1) > > hth > > > > On Tue, Jul 26, 2016 at 3:31 AM, janardhan shetty <janardhan...@gmail.com> > wrote

Re: ORC v/s Parquet for Spark 2.0

2016-07-25 Thread janardhan shetty
n may be one should choose Parquet > 5) AFAIK, Parquet has its metadata at the end of the file (correct me if > something has changed) . It means that Parquet file must be completely read > & put into RAM. If there is no enough RAM or file somehow is corrupted --> > proble

Re: Maintaining order of pair rdd

2016-07-25 Thread janardhan shetty
e a sortWith. > Basically , a groupBy reduces your structure to (anyone correct me if i m > wrong) a RDD[(key,val)], which you can see as a tuple.so you could use > sortWith (or sortBy, cannot remember which one) (tpl=> tpl._1) > hth > > On Mon, Jul 25, 2016 at 1:2

ORC v/s Parquet for Spark 2.0

2016-07-25 Thread janardhan shetty
Just wondering advantages and disadvantages to convert data into ORC or Parquet. In the documentation of Spark there are numerous examples of Parquet format. Any strong reasons to chose Parquet over ORC file format ? Also : current data compression is bzip2

Re: Bzip2 to Parquet format

2016-07-25 Thread janardhan shetty
ataframe.save(“/path”) to create a parquet file. > > Reference for SQLContext / createDataFrame: > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SQLContext > > > > On Jul 24, 2016, at 5:34 PM, janardhan shetty <janardhan...@gmail.com> > wrote: >

Bzip2 to Parquet format

2016-07-24 Thread janardhan shetty
We have data in Bz2 compression format. Any links in Spark to convert into Parquet and also performance benchmarks and uses study materials ?

K-means Evaluation metrics

2016-07-24 Thread janardhan shetty
Hi, I was trying to evaluate k-means clustering prediction since the exact cluster numbers were provided before hand for each data point. Just tried the Error = Predicted cluster number - Given number as brute force method. What are the evaluation metrics available in Spark for K-means

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty
st:List[(Int, Int, Int)]):T = { > if (lst.isEmpty): /// return your comparison > else { > val splits = lst.splitAt(5) > // do sometjhing about it using splits._1 > iterate(splits._2) >} > > will this help? or am i still missing somethi

Frequent Item Pattern Spark ML Dataframes

2016-07-24 Thread janardhan shetty
Is there any implementation of FPGrowth and Association rules in Spark Dataframes ? We have in RDD but any pointers to Dataframes ?

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty
. Similarly next 5 elements in that order until the end of number of elements. Let me know if this helps On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mmistr...@gmail.com> wrote: > Apologies I misinterpreted could you post two use cases? > Kr > > On 24 Jul 2016 3:41 pm,

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty
Marco, Thanks for the response. It is indexed order and not ascending or descending order. On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mmistr...@gmail.com> wrote: > Use map values to transform to an rdd where values are sorted? > Hth > > On 24 Jul 2016 6:23 am, &

Locality sensitive hashing

2016-07-24 Thread janardhan shetty
I was looking through to implement locality sensitive hashing in dataframes. Any pointers for reference?

Maintaining order of pair rdd

2016-07-23 Thread janardhan shetty
I have a key,value pair rdd where value is an array of Ints. I need to maintain the order of the value in order to execute downstream modifications. How do we maintain the order of values? Ex: rdd = (id1,[5,2,3,15], Id2,[9,4,2,5]) Followup question how do we compare between one element in rdd

Re: Unresolved dependencies while creating spark application Jar

2016-07-22 Thread janardhan shetty
ng Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Jul 22, 2016 at 4:23 PM, janardhan shetty > <janardhan...@gmail.com> wrote: > > Changed to sbt.0.14.3 and it gave : > > > > [info] Packaging > &g

Re: Unresolved dependencies while creating spark application Jar

2016-07-22 Thread janardhan shetty
need to create assembly.sbt file inside project directory if so what will the the contents of it for this config ? On Fri, Jul 22, 2016 at 5:42 AM, janardhan shetty <janardhan...@gmail.com> wrote: > Is scala version also the culprit? 2.10 and 2.11.8 > > Also Can you give the step

Re: Unresolved dependencies while creating spark application Jar

2016-07-22 Thread janardhan shetty
ki > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Jul 22, 2016 at 2:08 PM, janardhan shetty > <janardhan...@gmail.com> wrote: > > Hi, &

Unresolved dependencies while creating spark application Jar

2016-07-22 Thread janardhan shetty
Hi, I was setting up my development environment. Local Mac laptop setup IntelliJ IDEA 14CE Scala Sbt (Not maven) Error: $ sbt package [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn]