Unsubscribe

2018-10-05 Thread Donni Khan

the best tool to interact with Spark

2018-06-26 Thread Donni Khan
Hi all, What is the best tool to interact easly with Spark? Thank you, Donni

problem with saving RandomForestClassifier model - Saprk Java

2018-05-22 Thread Donni Khan
Hi SPark users, I built Random forest model by using Spark 1.6 with Java. I'm getting the following exception: User class threw exception: java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable. Does

Tuning Resource Allocation during runtime

2018-04-27 Thread Donni Khan
Hi All, Is there any way to change the number of executors/cores during running Saprk Job. I have Spark Job containing two tasks: First task need many executors to run fastly. the second task has many input and output opeartions and shuffling, so it needs few executors, otherwise it taks loong

run huge number of queries in Spark

2018-04-04 Thread Donni Khan
Hi all, I want to run huge number of queries on Dataframe in Spark. I have a big data of text documents, I loded all documents into SparkDataFrame and create a temp table. dataFrame.registerTempTable("table1"); I have more than 50,000 terms, I want to get the document frequency for each by

Re: Calculate co-occurring terms

2018-03-27 Thread Donni Khan
Hi again, I found example in Scala <https://stackoverflow.com/questions/43797758/calculate-co-occurrence-terms-with-spark-using-scala?rq=1> but I don't have any experience with scala? can anyone convert it to java please? Thank you, Donni On Fri, Mar 23, 2018 at 8:57 AM, Donni Khan <p

Calculate co-occurring terms

2018-03-23 Thread Donni Khan
Hi, I have a collection of text documents, I extracted the list of significat terms from that collection. I want to calculate co-occurance matrix for the extracted terms by using spark. I actually stored the the collection of text document in a DataFrame, StructType schema = *new*

high TFIDF value terms

2018-02-05 Thread Donni Khan
Hi, anyone knows how I can get the high TFIDF value terms by using Spark(Java)? IDF idf = *new* IDF().setInputCol("TF").setOutputCol("IDF"); IDFModel idfModel = idf.fit(featurizedData); DataFrame tfidf = idfModel.transform(featurizedData); Thanks; Donni

Singular Value Decomposition (SVD) in Spark Java

2018-01-31 Thread Donni Khan
Hi, I would like to use the *Singular Value Decomposition* (SVD) to extract the important concepts from a collection of text documents. I applied all preprcessing pipeline( Tokenizer, IDFModel, Matrix, ... ) then I applied SVD SingularValueDecomposition svd =

cosine similarity implementation in Java Spark

2017-12-14 Thread Donni Khan
Hi all, Is there any Implemenation of cosine similarity supports Java? Thanks, Donni

cosine similarity in Java Spark

2017-12-14 Thread Donni Khan
Hi all, Is there any Implemenation of cosine similarity supports Java? Thanks, Donni

Cosine Similarity between documents - Rows

2017-11-27 Thread Donni Khan
I have spark job to compute the similarity between text documents: RowMatrix rowMatrix = new RowMatrix(vectorsRDD.rdd()); CoordinateMatrix rowsimilarity=rowMatrix.columnSimilarities(0.5);JavaRDD entries = rowsimilarity.entries().toJavaRDD(); List list = entries.collect(); for(MatrixEntry s :

cosine similarity between rows

2017-10-27 Thread Donni Khan
I have spark job to compute the similarity between text documents: RowMatrix rowMatrix = new RowMatrix(vectorsRDD.rdd()); CoordinateMatrix rowsimilarity=rowMatrix.columnSimilarities(0.5);JavaRDD entries = rowsimilarity.entries().toJavaRDD(); List list = entries.collect(); for(MatrixEntry s :

text processing in spark (Spark job stucks for several minutes)

2017-10-26 Thread Donni Khan
Hi, I'm applying preprocessing methods on big data of text by using spark-Java. I created my own NLP pipline as a normal java code and call it in the map function like this: MyRDD.map(call nlp pipeline fr each row) I run my job in a cluster 14 machines(32 Cores and about 140G for each). The job