from:"Anu B Nair"

Re: unsubsribe

2018-10-30 Thread Anu B Nair

I have already send minimum 10 times! Today also I have send one!

On Tue, Oct 30, 2018 at 3:51 PM Biplob Biswas 
wrote:

> You need to send the email to user-unsubscr...@spark.apache.org and not
> to the usergroup.
>
> Thanks & Regards
> Biplob Biswas
>
>
> On Tue, Oct 30, 2018 at 10:59 AM Anu B Nair  wrote:
>
>> I am sending this Unsubscribe mail for last few months! It never happens!
>> If anyone can help us to unsubscribe it wil be really helpful!
>>
>> On Tue, Oct 30, 2018 at 3:27 PM Mohan Palavancha <
>> mohan.palavan...@gmail.com> wrote:
>>
>>>
>>>

Re: unsubsribe

2018-10-30 Thread Anu B Nair

I am sending this Unsubscribe mail for last few months! It never happens!
If anyone can help us to unsubscribe it wil be really helpful!

On Tue, Oct 30, 2018 at 3:27 PM Mohan Palavancha 
wrote:

>
>

Unsubscribe

2018-09-06 Thread Anu B Nair

Hi,

I have tried all possible way to unsubscripted from this group. Can anyone
help?

--
Anu

Hi,

I have a data set size of 10GB(example Test.txt).

I wrote my pyspark script like below(Test.py):

*from pyspark import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
spark = SparkSession.builder.appName("FilterProduct").getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sc)
lines = spark.read.text("C:/Users/test/Desktop/Test.txt").rdd
lines.collect()*

Then I am executing the above script using below command :

spark-submit Test.py --executor-memory  15G --driver-memory 15G

Then I am getting error like below:



*17/12/29 13:27:18 INFO FileScanRDD: Reading File path:
file:///C:/Users/test/Desktop/Test.txt, range: 402653184-536870912,
partition values: [empty row]
17/12/29 13:27:18 INFO CodeGenerator: Code generated in 22.743725 ms
17/12/29 13:27:44 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3230)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:383)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/12/29 13:27:44 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3230)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93*

Please let me know how to resolve this ?

--


Anu

Fwd: [pyspark][MLlib] Getting WARN FPGrowth: Input data is not cached for cached data

2017-12-21 Thread Anu B Nair

Hi,

Following is my pyspark code, (attached input sample_fpgrowth.txt and
python code along with this mail. Even after I have done cache, I am
getting Warning: Input data is not cached.

















*from pyspark.mllib.fpm import FPGrowthimport pysparkfrom pyspark.context
import SparkContextfrom pyspark.sql.session import SparkSessionsc =
SparkContext('local')data = sc.textFile("sample_fpgrowth.txt")transactions
= data.map(lambda line: line.strip().split(' ')).cache()model =
FPGrowth.train(transactions, minSupport=0.2, numPartitions=10)result =
model.freqItemsets().collect()print(result)*


Understood that it is a warning, but just wanted to know in detail

--

Anu
r z h k p
z y x w v u t s
s x o n r
x z y m t s q e
z
x z y r q t p
from pyspark.mllib.fpm import FPGrowth

import pyspark
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')


data = sc.textFile("sample_fpgrowth.txt")
transactions = data.map(lambda line: line.strip().split(' ')).cache()

model = FPGrowth.train(transactions, minSupport=0.2, numPartitions=10)

result = model.freqItemsets().collect()

print(result)

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: unsubsribe

Re: unsubsribe

Unsubscribe

Unsubscribe

Unsubscribe

Unsubscribe

unsubscribe

Java heap space OutOfMemoryError in pyspark spark-submit (spark version:2.2)

Fwd: [pyspark][MLlib] Getting WARN FPGrowth: Input data is not cached for cached data

9 matches

Site Navigation

Mail list logo

Footer information