Parquet

2018-07-19 Thread amin mohebbi
We do have two big tables each includes 5 billion of rows, so my question here is should we partition /sort the data and convert it to Parquet before doing any join? Best Regards ... Amin Mohebbi PhD candidate in Software Engineering

Unpivoting

2018-07-10 Thread amin mohebbi
Does anyone know how to transpose the columns in Spark -scala ?  This is how I want to unpivot the table  : How to unpivot the table based on the multiple columns | | | | | | | | | | | How to unpivot the table based on the multiple columns I am using Scala and Spark to unpivot a

Interactive queries

2018-06-29 Thread amin mohebbi
... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my    amin_...@me.comd

submitting dependencies

2018-06-26 Thread amin mohebbi
-jars /home/sshuser/reactiveinflux-spark_2.10-1.4.0.10.0.5.1.jar sapn_2.11-1.0.jar Can you help to solve this issue?  Best Regards ....... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp02

Big data visualization

2018-05-27 Thread amin mohebbi
? files system/time series db/azure cosmos / standard db?2- Is it right way to do to use spark as to  etl and aggregation application , store it somewhere and use power bi for reporting and dashboard purposes?  Best Regards ... Amin Mohebbi PhD

Time series data

2018-05-24 Thread amin mohebbi
with nosql as I think combination of these two could help to have random access and run many queries by different users. 2- do we really need to use a time series db?  Best Regards ... Amin Mohebbi PhD candidate in Software Engineering

Mllib Error

2014-12-11 Thread amin mohebbi
:unresolved dependency spark-mllib;1.1.1 : not foundAnyone knows how to add dependency of Mllib in .sbt file? Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail

Mllib error

2014-12-09 Thread amin mohebbi
... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my   amin_...@me.com

K-means clustering

2014-11-25 Thread amin mohebbi
that I do not want to use Mllib and would like to write my own k-means.  Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my

k-means clustering

2014-11-18 Thread amin mohebbi
... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my   amin_...@me.com

canopy clustering

2014-11-10 Thread amin mohebbi
... Amin Mohebbi PhD candidate in Software Engineering at university of Malaysia Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my amin_...@me.com

Kmeans

2014-07-16 Thread amin mohebbi
Can anyone explain to me what is difference between kmeans in Mlib and kmeans in examples/src/main/python/kmeans.py?   Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   H#x2F;P : +60 18

Re: spark Driver

2014-07-09 Thread amin mohebbi
a separate machine for driver ? Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   H#x2F;P : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my   amin_...@me.com On Wednesday

How to host spark driver

2014-07-09 Thread amin mohebbi
-submit?   Best Regards ... Amin Mohebbi PhD candidate in Software Engineering   at university of Malaysia   H#x2F;P : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my   amin_...@me.com

spark Driver

2014-07-08 Thread amin mohebbi
CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave2:33758] - [akka.tcp://spark@master:54477] disassociated! Shutting down.     Best Regards  ...  Amin Mohebbi  PhD candidate in Software Engineering    at university of Malaysia