How to consider HTML files in Spark

2015-03-12 Thread yh18190
Hi.I am very much fascinated to Spark framework.I am trying to use Pyspark + Beautifulsoup to parse HTML files.I am facing problems to load html file into beautiful soup. Example filepath= file:///path to html directory def readhtml(inputhtml): { soup=Beautifulsoup(inputhtml) //to load html

Re: Unable to ship external Python libraries in PYSPARK

2014-10-07 Thread yh18190
Hi David, Thanks for the reply and effort u put to explain the concepts.Thanks for example.It worked. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-ship-external-Python-libraries-in-PYSPARK-tp14074p15844.html Sent from the Apache Spark User

Unable to ship external Python libraries in PYSPARK

2014-09-12 Thread yh18190
Hi all, I am currently working on pyspark for NLP processing etc.I am using TextBlob python library.Normally in a standalone mode it easy to install the external python libraries .In case of cluster mode I am facing problem to install these libraries on worker nodes remotely.I cannot access each

Request for help in writing to Textfile

2014-08-25 Thread yh18190
Hi Guys, I am currently playing with huge data.I have an RDD which returns RDD[List[(tuples)]].I need only the tuples to be written to textfile output using saveAsTextFile function. example:val mod=modify.saveASTextFile() returns

Request for Help

2014-08-25 Thread yh18190
Hi Guys, I just want to know whether their is any way to determine which file is being handled by spark from a group of files input inside a directory.Suppose I have 1000 files which are given as input,I want to determine which file is being handled currently by spark program so that if any error

Is their a way to Create SparkContext object?

2014-05-12 Thread yh18190
Hi, Could anyone suggest an idea how can we create sparkContext object in other classes or fucntions where we need to convert a scala collection to RDD using sc object.like sc.makeRDD(list).instead of using Main class sparkcontext object? is their a way to pass sc object as a parameter to

Regarding Partitioner

2014-04-16 Thread yh18190
Hi,, I have large dataset of elemenst[RDD] and i want to divide it into two exactly equal sized partitions maintaining order of elements.I tried using RangePartitioner like var data= partitionedFile.partitionBy(new RangePartitioner(2, partitionedFile)). This doesnt give satisfactory results

Problem with KryoSerializer

2014-04-15 Thread yh18190
Hi, I have a problem when i want to use spark kryoserializer by extending a class Kryoregistarar to register custom classes inorder to create objects.I am getting following exception When I run following program..Please let me know what could be the problem... ] (run-main)

Re: How to index each map operation????

2014-04-02 Thread yh18190
Hi Therry, Thanks for the above responses..I implemented using RangePartitioner..we need to use any of the custom partitioners in orderto perform this task..Normally u cant maintain a counter becoz count operations should beperformed on each partitioned block ofdata... -- View this message in

Can we convert scala.collection.ArrayBuffer[(Int,Double)] to org.spark.RDD[(Int,Double])

2014-03-30 Thread yh18190
Hi, Can we convert directly scala collection to spark RDD data type without using parellize method? Is their any way to create custom converted RDD datatype from scala type using some typecast like that? Please suggest me -- View this message in context:

Zip or map elements to create new RDD

2014-03-29 Thread yh18190
Hi, I have an RDD of elements and want to create a new RDD by Zipping other RDD in order. result[RDD] with sequence of 10,20,30,40,50 ...elements. I am facing problems as index is not an RDD...its gives an error...Could anyone help me how we can zip it or map it inorder to obtain following

Re: Zip or map elements to create new RDD

2014-03-29 Thread yh18190
Thanks sonal.Is der anyother way like to map values with Increasing indexes...so that i can map(t=(i,t)) where value if 'i' increases after each map operation on element... Please help me ..in this aspect -- View this message in context:

How to index each map operation????

2014-03-29 Thread yh18190
Hi, I want to perform map operation on an RDD of elements such that resulting RDD is a key value pair(counter,value) For example var k:RDD[Int]=10,20,30,40,40,60... k.map(t=(i,t)) where 'i' value should be like a counter whose value increments after each mapoperation... Pleas help me.. I tried

Re: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread yh18190
Hi, Thanks Nanzhu.I tried to implement your suggestion on following scenario.I have RDD of say 24 elements.In that when i partioned into two groups of 12 elements each.Their is loss of order of elements in partition.Elemest are partitioned randomly.I need to preserve the order such that the first

RE: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread yh18190
Hi Andriana, Thanks for suggestion.Could you please modify my code part where I need to do so..I apologise for inconvinience ,becoz i am new to spark I coudnt apply appropriately..i would be thankful to you. -- View this message in context:

Splitting RDD and Grouping together to perform computation

2014-03-24 Thread yh18190
Hi,I have large data set of numbers ie RDD and wanted to perform a computation only on groupof two values at a time.For example1,2,3,4,5,6,7... is an RDDCan i group the RDD into (1,2),(3,4),(5,6)...?? and perform the respective computations ?in an efficient manner?As we do'nt have a way to index

Re: Splitting RDD and Grouping together to perform computation

2014-03-24 Thread yh18190
We need some one who can explain us with short code snippet on given example so that we get clear cut idea on RDDs indexing.. Guys please help us -- View this message in context:

Regarding Successive operation on elements and recursively

2014-03-18 Thread yh18190
Hi , I am new to Spark scala environment.Currently I am working on Discrete wavelet transformation algos on time series data. I have to perform recursive additions on successive elements in RDDs. for example List of elements(RDDS) --a1 a2 a3 a4. level1 Tranformation --a1+a2 a3+a4 a1-a2