RE: removing header from csv file

2016-04-27 Thread Mishra, Abhishek
You should be doing something like this:


data = sc.textFile('file:///path1/path/test1.csv')
header = data.first() #extract header
#print header
data = data.filter(lambda x:x !=header)
#print data
Hope it helps.

Sincerely,
Abhishek
+91-7259028700

From: nihed mbarek [mailto:nihe...@gmail.com]
Sent: Wednesday, April 27, 2016 11:29 AM
To: Divya Gehlot
Cc: Ashutosh Kumar; user @spark
Subject: Re: removing header from csv file

You can add a filter with string that you are sure available only in the header

Le mercredi 27 avril 2016, Divya Gehlot 
> a écrit :
yes you can remove the headers by removing the first row

can first() or head() to do that


Thanks,
Divya

On 27 April 2016 at 13:24, Ashutosh Kumar 
>
 wrote:
I see there is a library spark-csv which can be used for removing header and 
processing of csv files. But it seems it works with sqlcontext only. Is there a 
way to remove header from csv files without sqlcontext ?
Thanks
Ashutosh



--

M'BAREK Med Nihed,
Fedora Ambassador, TUNISIA, Northern Africa
http://www.nihed.com

[http://www.linkedin.com/img/webpromo/btn_myprofile_160x33_fr_FR.png]



RE: LDA topic Modeling spark + python

2016-02-24 Thread Mishra, Abhishek
Hello All,

If someone has any leads on this please help me.

Sincerely,
Abhishek

From: Mishra, Abhishek
Sent: Wednesday, February 24, 2016 5:11 PM
To: user@spark.apache.org
Subject: LDA topic Modeling spark + python


Hello All,





I am doing a LDA model, please guide me with something.



I have a csv file which has two column "user_id" and "status". I have to 
generate a word-topic distribution after aggregating the user_id. Meaning to 
say I need to model it for users on their grouped status. The topic length 
being 2000 and value of k or number of words being 3.



Please, if you can provide me with some link or some code base on spark with 
python ; I would be grateful.





Looking forward for a  reply,



Sincerely,

Abhishek



LDA topic Modeling spark + python

2016-02-24 Thread Mishra, Abhishek
Hello All,





I am doing a LDA model, please guide me with something.



I have a csv file which has two column "user_id" and "status". I have to 
generate a word-topic distribution after aggregating the user_id. Meaning to 
say I need to model it for users on their grouped status. The topic length 
being 2000 and value of k or number of words being 3.



Please, if you can provide me with some link or some code base on spark with 
python ; I would be grateful.





Looking forward for a  reply,



Sincerely,

Abhishek



value from groubBy paired rdd

2016-02-23 Thread Mishra, Abhishek
Hello All,


I am new to spark and python, here is my doubt, please suggest...

I have a csv file which has 2 column "user_id" and "status".
I have read it into a rdd and then removed the header of the csv file. Then I 
split the record by "," (comma) and generate pair rdd. On that rdd I 
groupByKey. Now that I am trying to gather the value only from rdd and create a 
list I am getting exceptions. Here is my code. Please suggest how can I just 
get the values from the grouped rdd and store them, csv has 2 columns...I am 
trying to extract using x[1]. Code below: The code in pyspark:

data = sc.textFile('file:///home/cloudera/LDA-Model/Pyspark/test1.csv')
header = data.first() #extract header
data = data.filter(lambda x:x !=header)#filter out header
pairs = data.map(lambda x: (x.split(",")[0], x))#.collect()#generate pair rdd 
key value
grouped=pairs.groupByKey()#grouping values as per key
grouped_val= grouped.map(lambda x : (list(x[1]))).collect()
print grouped_val



Thanks in Advance,
Sincerely,
Abhishek



RE: Sample project on Image Processing

2016-02-22 Thread Mishra, Abhishek
Thank you Everyone.
I am to work on PoC with 2 types of images, that basically will be two PoC’s. 
Face recognition and Map data processing.

I am looking to these links and hopefully will get an idea. Thanks again. Will 
post the queries as and when I get doubts.

Sincerely,
Abhishek

From: ndj...@gmail.com [mailto:ndj...@gmail.com]
Sent: Monday, February 22, 2016 7:31 PM
To: Sainath Palla
Cc: Mishra, Abhishek; user@spark.apache.org
Subject: Re: Sample project on Image Processing

Hi folks,

KeystoneML has some image processing features: 
http://keystone-ml.org/examples.html

 Cheers,
Ardo

Sent from my iPhone

On 22 Feb 2016, at 14:34, Sainath Palla 
<pallasain...@gmail.com<mailto:pallasain...@gmail.com>> wrote:
Here is one simple example of Image classification in Java.

http://blogs.quovantis.com/image-classification-using-apache-spark-with-linear-svm/

Personally, I feel python provides better libraries for image processing. But 
it mostly depends on what kind of Image processing you are doing.

If you are stuck at the initial stages to load/save images, here is sample code 
to do the same. This is in PySpark.



from PIL import Image
import numpy as np

#Load Images in form of binary Files

images = sc.binaryFiles("Path")

#Convert Image to array. It converts the image into [x,y,3]  format
# x,y are image dimensions and 3 is for R,G,B format.

image_to_array = lambda rawdata: np.asarray(Image.open(StringIO(rawdata)))

#Saving the image to file after processing
#x has image name and img has image in array

for x,img in imageOutIMG.toLocalIterator():
path="Path"+x+".jpg"
img.save(path)






On Mon, Feb 22, 2016 at 3:23 AM, Mishra, Abhishek 
<abhishek.mis...@xerox.com<mailto:abhishek.mis...@xerox.com>> wrote:
Hello,
I am working on image processing samples. Was wondering if anyone has worked on 
Image processing project in spark. Please let me know if any sample project or 
example is available.

Please guide in this.
Sincerely,
Abhishek



Sample project on Image Processing

2016-02-22 Thread Mishra, Abhishek
Hello,
I am working on image processing samples. Was wondering if anyone has worked on 
Image processing project in spark. Please let me know if any sample project or 
example is available.

Please guide in this.
Sincerely,
Abhishek


MongoDB and Spark

2015-09-11 Thread Mishra, Abhishek
Hello ,

Is there any way to query multiple collections from mongodb using spark and 
java.  And i want to create only one Configuration Object. Please help if 
anyone has something regarding this.


Thank You
Abhishek


RE: MongoDB and Spark

2015-09-11 Thread Mishra, Abhishek
Anything using Spark RDD’s ???

Abhishek

From: Sandeep Giri [mailto:sand...@knowbigdata.com]
Sent: Friday, September 11, 2015 3:19 PM
To: Mishra, Abhishek; user@spark.apache.org; d...@spark.apache.org
Subject: Re: MongoDB and Spark


use map-reduce.

On Fri, Sep 11, 2015, 14:32 Mishra, Abhishek 
<abhishek.mis...@xerox.com<mailto:abhishek.mis...@xerox.com>> wrote:
Hello ,

Is there any way to query multiple collections from mongodb using spark and 
java.  And i want to create only one Configuration Object. Please help if 
anyone has something regarding this.


Thank You
Abhishek


Spark Interview Questions

2015-07-29 Thread Mishra, Abhishek
Hello,

Please help me with links or some document for Apache Spark interview questions 
and answers. Also for the tools related to it ,for which questions could be 
asked.

Thanking you all.

Sincerely,
Abhishek

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Spark Interview Questions

2015-07-29 Thread Mishra, Abhishek
Hello Vaquar,

I have working knowledge and experience in Spark. I just wanted to test or do a 
mock round to evaluate myself. Thank you for the reply,

Please share something if you have for the same.

Sincerely,
Abhishek

From: vaquar khan [mailto:vaquar.k...@gmail.com]
Sent: Wednesday, July 29, 2015 8:22 PM
To: Mishra, Abhishek
Cc: User
Subject: Re: Spark Interview Questions


Hi Abhishek,

Please  learn spark ,there are no shortcuts for sucess.

Regards,
Vaquar khan
On 29 Jul 2015 11:32, Mishra, Abhishek 
abhishek.mis...@xerox.commailto:abhishek.mis...@xerox.com wrote:
Hello,

Please help me with links or some document for Apache Spark interview questions 
and answers. Also for the tools related to it ,for which questions could be 
asked.

Thanking you all.

Sincerely,
Abhishek

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org


RE: Installation On Windows machine

2014-08-27 Thread Mishra, Abhishek
Thank you for the reply Matei,

Is there something which we missed. ? I am able to run the spark instance on my 
local system i.e. Windows 7 but the same set of steps do not allow me to run it 
on  Windows server 2012 machine. The black screen just appears for a fraction 
of second and disappear, I am unable to debug the same. Please guide me.

Thanks,

Abhishek


-Original Message-
From: Matei Zaharia [mailto:matei.zaha...@gmail.com] 
Sent: Saturday, August 23, 2014 9:47 AM
To: Mishra, Abhishek
Cc: user@spark.apache.org
Subject: Re: Installation On Windows machine

You should be able to just download / unzip a Spark release and run it on a 
Windows machine with the provided .cmd scripts, such as bin\spark-shell.cmd. 
The scripts to launch a standalone cluster (e.g. start-all.sh) won't work on 
Windows, but you can launch a standalone cluster manually using

bin\spark-class org.apache.spark.deploy.master.Master

and

bin\spark-class org.apache.spark.deploy.worker.Worker spark://master:port

For submitting jobs to YARN instead of the standalone cluster, spark-submit.cmd 
*may* work but I don't think we've tested it heavily. If you find issues with 
that, please let us know. But overall the instructions should be the same as on 
Linux, except you use the .cmd scripts instead of the .sh ones.

Matei

On Aug 22, 2014, at 3:01 AM, Mishra, Abhishek abhishek.mis...@xerox.com wrote:

 Hello Team,
  
 I was just trying to install spark on my windows server 2012 machine and use 
 it in my project; but unfortunately I do not find any documentation for the 
 same. Please let me know if we have drafted anything for spark users on 
 Windows. I am really in need of it as we are using Windows machine for Hadoop 
 and other tools and so cannot move back to Linux OS or anything. We run 
 Hadoop on hortonworks HDP2.0  platform and also recently I came across Spark 
 and so wanted use this even in my project for my Analytics work. Please 
 suggest me links or documents where I can move ahead with my installation and 
 usage. I want to run it on Java.
  
 Looking forward for a reply,
  
 Thanking you in Advance,
 Sincerely,
 Abhishek
  
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




RE: Installation On Windows machine

2014-08-27 Thread Mishra, Abhishek
I got it upright Matei, 
Thank you. I was giving wrong directory path. Thank you...!!

Thanks,

Abhishek Mishra
-Original Message-
From: Mishra, Abhishek [mailto:abhishek.mis...@xerox.com] 
Sent: Wednesday, August 27, 2014 4:38 PM
To: Matei Zaharia
Cc: user@spark.apache.org
Subject: RE: Installation On Windows machine

Thank you for the reply Matei,

Is there something which we missed. ? I am able to run the spark instance on my 
local system i.e. Windows 7 but the same set of steps do not allow me to run it 
on  Windows server 2012 machine. The black screen just appears for a fraction 
of second and disappear, I am unable to debug the same. Please guide me.

Thanks,

Abhishek


-Original Message-
From: Matei Zaharia [mailto:matei.zaha...@gmail.com]
Sent: Saturday, August 23, 2014 9:47 AM
To: Mishra, Abhishek
Cc: user@spark.apache.org
Subject: Re: Installation On Windows machine

You should be able to just download / unzip a Spark release and run it on a 
Windows machine with the provided .cmd scripts, such as bin\spark-shell.cmd. 
The scripts to launch a standalone cluster (e.g. start-all.sh) won't work on 
Windows, but you can launch a standalone cluster manually using

bin\spark-class org.apache.spark.deploy.master.Master

and

bin\spark-class org.apache.spark.deploy.worker.Worker spark://master:port

For submitting jobs to YARN instead of the standalone cluster, spark-submit.cmd 
*may* work but I don't think we've tested it heavily. If you find issues with 
that, please let us know. But overall the instructions should be the same as on 
Linux, except you use the .cmd scripts instead of the .sh ones.

Matei

On Aug 22, 2014, at 3:01 AM, Mishra, Abhishek abhishek.mis...@xerox.com wrote:

 Hello Team,
  
 I was just trying to install spark on my windows server 2012 machine and use 
 it in my project; but unfortunately I do not find any documentation for the 
 same. Please let me know if we have drafted anything for spark users on 
 Windows. I am really in need of it as we are using Windows machine for Hadoop 
 and other tools and so cannot move back to Linux OS or anything. We run 
 Hadoop on hortonworks HDP2.0  platform and also recently I came across Spark 
 and so wanted use this even in my project for my Analytics work. Please 
 suggest me links or documents where I can move ahead with my installation and 
 usage. I want to run it on Java.
  
 Looking forward for a reply,
  
 Thanking you in Advance,
 Sincerely,
 Abhishek
  
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Installation On Windows machine

2014-08-22 Thread Mishra, Abhishek
Hello Team,

I was just trying to install spark on my windows server 2012 machine and use it 
in my project; but unfortunately I do not find any documentation for the same. 
Please let me know if we have drafted anything for spark users on Windows. I am 
really in need of it as we are using Windows machine for Hadoop and other tools 
and so cannot move back to Linux OS or anything. We run Hadoop on hortonworks 
HDP2.0  platform and also recently I came across Spark and so wanted use this 
even in my project for my Analytics work. Please suggest me links or documents 
where I can move ahead with my installation and usage. I want to run it on Java.

Looking forward for a reply,

Thanking you in Advance,
Sincerely,
Abhishek

Thanks,

Abhishek Mishra
Software Engineer
Innovation Delivery CoE (IDC)

Xerox Services India
4th Floor Tapasya, Infopark,
Kochi, Kerala, India 682030

m +91-989-516-8770

www.xerox.com/businessserviceshttp://cts.vresp.com/c/?Corporate/b657ed2e2b/abc2164da2/fcb77b0c6c