Profiling memory use and access

2016-04-24 Thread Edmon Begoli
I am working on an experimental research into memory use and profiling of memory use and allocation by machine learning functions across number of popular libraries. Is there a facility within Spark, and MLlib specifically to track the allocation and use of data frames/memory by MLlib? Please

When did Spark started supporting ORC and Parquet?

2016-04-14 Thread Edmon Begoli
I am needing this fact for the research paper I am writing right now. When did Spark start supporting Parquet and when ORC? (what release) I appreciate any info you can offer. Thank you, Edmon

Small-cluster deployment modes

2015-07-24 Thread Edmon Begoli
Hey folks, I am wanting to setup a single machine or a small cluster machine to run our Spark based exploration lab. Does anyone have suggestions or metrics on feasibility of running Spark standalone on a good size RAM machine (64GB) with SSDs without resource manager. I expect on or two users

Reducing Spark's logging verbosity

2015-03-21 Thread Edmon Begoli
Hi, Does anyone have concrete recommendations how to reduce Spark's logging verbosity. We have attempted on several occasions to address this by setting various log4j properties, both in configuration property files and in $SPARK_HOME/conf/ spark-env.sh; however, all of those attempts have

Spark on HDFS vs. Lustre vs. other file systems - formal research and performance evaluation

2015-03-13 Thread Edmon Begoli
. Thank you, *Edmon Begoli, PhD* Chief Data Officer Joint Institute for Computational Sciences (JICS) ebeg...@tennessee.edu https://www.linkedin.com/in/ebegoli

Spark-SQL and Hive - is Hive required?

2015-03-06 Thread Edmon Begoli
Does Spark-SQL require installation of Hive for it to run correctly or not? I could not tell from this statement: https://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive Thank you, Edmon