from:"Mohit Singh"

create table in hive from spark-sql

2015-09-23 Thread Mohit Singh

Probably a noob question. But I am trying to create a hive table using spark-sql. Here is what I am trying to do: hc = HiveContext(sc) hdf = hc.parquetFile(output_path) data_types = hdf.dtypes schema = "(" + " ,".join(map(lambda x: x[0] + " " + x[1], data_types)) +")" hc.sql(" CREATE TABLE IF

Re: Spark installation

2015-02-10 Thread Mohit Singh

For local machine, I dont think there is any to install.. Just unzip and go to $SPARK_DIR/bin/spark-shell and that will open up a repl... On Tue, Feb 10, 2015 at 3:25 PM, King sami kgsam...@gmail.com wrote: Hi, I'm new in Spark. I want to install it on my local machine (Ubunti 12.04) Could

Re: ImportError: No module named pyspark, when running pi.py

2015-02-09 Thread Mohit Singh

I think you have to run that using $SPARK_HOME/bin/pyspark /path/to/pi.py instead of normal python pi.py On Mon, Feb 9, 2015 at 11:22 PM, Ashish Kumar ashish.ku...@innovaccer.com wrote: *Command:* sudo python ./examples/src/main/python/pi.py *Error:* Traceback (most recent call last):

is there a master for spark cluster in ec2

2015-01-28 Thread Mohit Singh

Hi, Probably a naive question.. But I am creating a spark cluster on ec2 using the ec2 scripts in there.. But is there a master param I need to set.. ./bin/pyspark --master [ ] ?? I don't yet fully understand the ec2 concepts so just wanted to confirm this?? Thanks -- Mohit When you want

Using third party libraries in pyspark

2015-01-22 Thread Mohit Singh

Hi, I might be asking something very trivial, but whats the recommend way of using third party libraries. I am using tables to read hdf5 format file.. And here is the error trace: print rdd.take(2) File /tmp/spark/python/pyspark/rdd.py, line , in take res =

Re: How to create Track per vehicle using spark RDD

2014-10-14 Thread Mohit Singh

Perhaps, its just me but lag function isnt familiar to me .. But have you tried configuring the spark appropriately http://spark.apache.org/docs/latest/configuration.html On Tue, Oct 14, 2014 at 5:37 PM, Manas Kar manasdebashis...@gmail.com wrote: Hi, I have an RDD containing Vehicle Number

Setting up jvm in pyspark from shell

2014-09-10 Thread Mohit Singh

Hi, I am using pyspark shell and am trying to create an rdd from numpy matrix rdd = sc.parallelize(matrix) I am getting the following error: JVMDUMP039I Processing dump event systhrow, detail java/lang/OutOfMemoryError at 2014/09/10 22:41:44 - please wait. JVMDUMP032I JVM requested Heap dump

Personalized Page rank in graphx

2014-08-20 Thread Mohit Singh

Hi, I was wondering if Personalized Page Rank algorithm is implemented in graphx. If the talks and presentation were to be believed ( https://amplab.cs.berkeley.edu/wp-content/uploads/2014/02/graphx@strata2014_final.pdf) it is.. but cant find the algo code (

Re: Question on mappartitionwithsplit

2014-08-17 Thread Mohit Singh

Building on what Davies Liu said, How about something like: def indexing(splitIndex, iterator,*offset_lists* ): count = 0 offset = sum(*offset_lists*[:splitIndex]) if splitIndex else 0 indexed = [] for i, e in enumerate(iterator): index = count + offset + i for j, ele in

Re: Using Python IDE for Spark Application Development

2014-08-07 Thread Mohit Singh

On Wed, Aug 6, 2014 at 6:22 PM, Mohit Singh mohit1...@gmail.com wrote: My naive set up.. Adding os.environ['SPARK_HOME'] = /path/to/spark sys.path.append(/path/to/spark/python) on top of my script. from pyspark import SparkContext from pyspark import SparkConf Execution works from within

Re: Using Python IDE for Spark Application Development

2014-08-06 Thread Mohit Singh

My naive set up.. Adding os.environ['SPARK_HOME'] = /path/to/spark sys.path.append(/path/to/spark/python) on top of my script. from pyspark import SparkContext from pyspark import SparkConf Execution works from within pycharm... Though my next step is to figure out autocompletion and I bet there

Re: Regularization parameters

2014-08-06 Thread Mohit Singh

One possible straightforward explanation might be your solution(s) might be stuck in local minima?? And depending on your weights initialization, you are getting different parameters? Maybe have same initial weights for both the runs... or I would probably test the execution with synthetic

Reading hdf5 formats with pyspark

2014-07-28 Thread Mohit Singh

Hi, We have setup spark on a HPC system and are trying to implement some data pipeline and algorithms in place. The input data is in hdf5 (these are very high resolution brain images) and it can be read via h5py library in python. So, my current approach (which seems to be working ) is writing

Spark streaming

2014-05-01 Thread Mohit Singh

Hi, I guess Spark is using streaming in context of streaming live data but what I mean is something more on the lines of hadoop streaming.. where one can code in any programming language? Or is something among that lines on the cards? Thanks -- Mohit When you want success as badly as you

create table in hive from spark-sql

Re: Spark installation

Re: ImportError: No module named pyspark, when running pi.py

is there a master for spark cluster in ec2

Using third party libraries in pyspark

Re: How to create Track per vehicle using spark RDD

Setting up jvm in pyspark from shell

Personalized Page rank in graphx

Re: Question on mappartitionwithsplit

Re: Using Python IDE for Spark Application Development

Re: Using Python IDE for Spark Application Development

Re: Regularization parameters

Reading hdf5 formats with pyspark

Spark streaming

14 matches

Site Navigation

Mail list logo

Footer information