= 8gb, executor-core=4
Memory:
8gb(0.4% per internal) - 4.8gb for actual computation and storage. lets
consider i have not done any persist in this case i could utilize 4.8gb per
executor.
IS IT POSSIBLE FOR ME TO USE 400MB file for BROADCAST JOIN?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெ
WARN TaskSetManager:66 - Lost task 0.0 in stage 8.0
(TID 14
, localhost, executor driver): java.lang.IllegalArgumentException: image ==
null
!
at javax.imageio.ImageTypeSpecifier.createFromRenderedImage(Unknown
Sour
ce)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Dear All,
i read about higher order function in databricks blog.
https://docs.databricks.com/spark/latest/spark-sql/higher-order-functions-lambda-functions.html
does higher order functionality available in our spark(open source)?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
All used cores aren't getting reported correctly in EMR, and YARN itself
> has no control over it, so whatever you put in `spark.executor.cores` will
> be used,
> but in the ResourceManager you will only see 1 vcore used per nodemanager.
>
> On Mon, Feb 26, 2018 at 5:20 AM, Selvam
; enough memory.
>
> Use see 5 executor because 4 for the job and one for the application
> master.
>
> serr the used menory and the total memory.
>
> On Mon, Feb 26, 2018 at 12:20 PM, Selvam Raman <sel...@gmail.com> wrote:
>
>> Hi,
>>
>> spark version
: 2500.054
BogoMIPS: 5000.10
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-7
On Mon, Feb 26, 2018 at 10:20 AM, Selvam Raman <sel...@gmail.
e 20g+10%overhead ram(22gb),
10 core(number of threads), 1 Vcore(cpu).
please correct me if my understand is wrong.
how can i utilize number of vcore in EMR effectively. Will Vcore boost
performance?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
.scala:294)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:158)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
left out the exception. On one hand I’m also not sure how well
> spacy serializes, so to debug this I would start off by moving the nlp =
> inside of my function and see if it still fails.
>
> On Thu, Feb 15, 2018 at 9:08 PM Selvam Raman <sel...@gmail.com> wrote:
>
>> imp
.
def f(x) : print(x)
description =
xmlData.filter(col("dcterms:description").isNotNull()).select(col("dcterms:description").alias("desc"))
description.rdd.flatMap(lambda row: getPhrases(row.desc)).foreach(f)
when i am trying to access getphrases i am getting below
hon3.6/site-packages/pyspark/rdd.py",
line 906, in fold
vals = self.mapPartitions(func).collect()
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyspark/rdd.py",
line 809, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jr
ns/3.6/lib/python3.6/pickle.py",
line 476, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
line 751, in save_tuple
save(element)
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
line 476, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyspark/cloudpickle.py",
line 368, in save_builtin_function
return self.save_function(obj)
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyspark/cloudpickle.py",
line 247, in save_function
if islambda(obj) or obj.__code__.co_filename == '' or themodule
is None:
AttributeError: 'builtin_function_or_method' object has no attribute
'__code__'
please help me.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
> val empInfoSchema = ArrayType(employeeSchema)
>
> empInfoSchema.json
>
> val empInfoStrDF = Seq((emp_info)).toDF("emp_info_str")
> empInfoStrDF.printSchema
> empInfoStrDF.show(false)
>
> val empInfoDF = empInfoStrDF.select(from_json('emp_info_str,
> empInfoSchema).as("emp_info"))
> empInfoDF.printSchema
>
> empInfoDF.select(struct("*")).show(false)
>
> empInfoDF.select("emp_info.name", "emp_info.address",
> "emp_info.docs").show(false)
>
> empInfoDF.select(explode('emp_info.getItem("name"))).show
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
can i get those details.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
frequently i got yan OOM and disk
full issue.
Could you please share your thoughts?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
)
How can i achieve the same df while i am reading from source?
doc = spark.read.text("/Users/rs/Desktop/nohup.out")
how can i create array type with "sentences" column from
doc(dataframe)
The below one creates more than one column.
rdd.map(lambda rdd: rdd[0]).map(lambda row:row.split(" "))
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 15090"...
Killed
Node-45.dev contains 8.9GB free while it throws out of memory. Can anyone
please help me to understand the issue?
On Mon, Apr 24, 2017 at 11:22 AM, Selvam Raman <sel...@gmail.com> wrote:
> Hi,
&g
ecutors 4
--executor-cores 2 --executor-memory 20g Word2VecExample.py
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
ne 681, in _batch_setitems
save(v)
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
line 317, in save
self.save_global(obj, rv)
File
"/Users/rs/Downloads/spark-2.0.1-bin-hadoop2.7/python/pyspark/cloudpickle.py",
line 390, in save_
Test2 2 1
Test3 3 2
Current approach:
1) Delete row in table1 where table1.composite key = table2.composite key.
2) Union all table and table2 to get updated result.
is this right approach?. is there any other way to achieve it?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
In Scala,
val ds = sqlContext.read.text("/home/spark/1.6/lines").as[String]
what is the equivalent code in pyspark?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
ption easily.
>>
>> Now you need write your own UDF, maybe can do what you want.
>>
>> Yong
>>
>> --
>> *From:* Selvam Raman <sel...@gmail.com>
>> *Sent:* Thursday, March 23, 2017 5:03 PM
>> *To:* user
>> *Subject:* how to read object field w
9":{}
}
I am having bzip json files like above format.
some json row contains two objects within source(like F1 and F2), sometime
five(F1,F2,F3,F4,F5),etc. So the final schema will contains combination of
all objects for the source field.
Now, every row will contain n number of objects but only some
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
there is one column which is longblob, if i convert to
unbase64. I face this problem. i could able to write to parquet without
conversion.
So is there some limit for bytes per line?. Please give me your suggestion.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Hi,
Is there a way to read xls and xlsx files using spark?.
is there any hadoop inputformat available to read xls and xlsx files which
could be used in spark?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
t 12:30 PM, Selvam Raman <sel...@gmail.com> wrote:
> Hi,
>
> how can i take heap dump in EMR slave node to analyze.
>
> I have one master and two slave.
>
> if i enter jps command in Master, i could see sparksubmit with pid.
>
> But i could not see anything
Hi,
how can i take heap dump in EMR slave node to analyze.
I have one master and two slave.
if i enter jps command in Master, i could see sparksubmit with pid.
But i could not see anything in slave node.
how can i take heap dump for spark job.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெ
nUID = 1L;
@Override
public void call(Iterator row) throws Exception
{
while(row.hasNext())
{
//Process data and insert into No-Sql DB
}
}
});
}
}
Now where can i apply rdd.checkpoint().
Thanks,
selvam
On Thu, Dec 15, 2016 at 10:44 PM, Selvam Raman <sel...@gmail.com> w
;.
> This will store checkpoints on that directory that I called checkpoint.
>
>
> Thank You,
>
> Irving Duran
>
> On Thu, Dec 15, 2016 at 10:33 AM, Selvam Raman <sel...@gmail.com> wrote:
>
>> Hi,
>>
>> is there any provision in spark batch for chec
or is there way for checkpoint
provision.
Checkpoint,what i am expecting is start from 71 partition to till end.
Please give me your suggestions.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
times (and wanting to spare the resources
> of our submitting machines) we have now switched to use yarn cluster mode
> by default. This seems to resolve the problem.
>
> Hope this helps,
>
> Daniel
>
> On 29 Nov 2016 11:20 p.m., "Selvam Raman" <sel...@gmail.com> w
.
Spark version:2.0( AWS EMR).
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Filed Type in cassandra : List
I am trying to insert Collections.emptyList() from spark to cassandra
list field. In cassandra it stores as null object.
How can i avoid null values here.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
584 bytes) compared to data it contains. b has just 85 rows and
> around 4964 bytes.
> Help is very much appreciated!!
>
> Thanks
> Swapnil
>
>
>
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
"file:///Users/rs/parti").rdd.partitions.length
res4: Int = 5
so how does parquet partitioning the data in spark?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
l data from the table. Why
it is reading all the data from table and doing sort merge join for 3 or 4
tables. Why it is not applying any filtering value.
Though i have given large memory for executor it is still throws the same
error. when spark sql do the joining how it is utilizing memory and cores.
Any guidelines would be greatly welcome.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
have faced the problem earlier.
Thanks,
Selvam R
On Mon, Oct 24, 2016 at 10:23 AM, Selvam Raman <sel...@gmail.com> wrote:
> Hi All,
>
> Please help me.
>
> I have 10 (tables data) parquet file in s3.
>
> I am reading and storing as Dataset then registered as temp table.
a:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
I found it. We can use pivot which is similar to cross tab
In postgres.
Thank you.
On Oct 17, 2016 10:00 PM, "Selvam Raman" <sel...@gmail.com> wrote:
> Hi,
>
> Please share me some idea if you work on this earlier.
> How can i develop postgres CROSSTAB function in
Hi,
I am having 40+ structured data stored in s3 bucket as parquet file .
I am going to use 20 table in the use case.
There s a Main table which drive the whole flow. Main table contains 1k
record.
My use case is for every record in the main table process the rest of
table( join group by
--+++
test1| val2 | val3 |
test2| val6 | val7 |
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
What i am trying to achieve is
Trigger query to get number(i.e.,1,2,3,...n)
for every number i have to trigger another 3 queries.
Thanks,
selvam R
On Wed, Oct 12, 2016 at 4:10 PM, Selvam Raman <sel...@gmail.com> wrote:
> Hi ,
>
> I am reading parquet file and creating temp t
ang.Thread.run(Thread.java:745)
16/10/12 15:59:53 INFO SparkContext: Invoking stop() from shutdown hook
Please let me know if i am missing anything. Thank you for the help.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
I mentioned parquet as input format.
On Oct 10, 2016 11:06 PM, "ayan guha" <guha.a...@gmail.com> wrote:
> It really depends on the input format used.
> On 11 Oct 2016 08:46, "Selvam Raman" <sel...@gmail.com> wrote:
>
>> Hi,
>>
>> How spar
RDD , then we can look at the partitions.size or length to check
how many partition for a file. But how this will be accomplished in terms
of S3 bucket.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
aFrame is not yet supported.
>
> There is an issue open[2]. I hope this is helpful.
>
> Thanks.
>
> [1] https://github.com/apache/spark/blob/27209252f09ff73c58e60c6df8aaba
> 73b308088c/sql/core/src/main/scala/org/apache/spark/sql/
> DataFrameReader.scala#L369
> [2] https://
.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Hi,
Need your input to take decision.
We have an n-number of databases(ie oracle, MySQL,etc). I want to read a
data from the sources but how it is maintaining fault tolerance in source
side.
if source side system went down. how the spark system reads the data.
--
Selvam Raman
"ல
ues)),schema)
in schema fields I have mentioned timestamp as
*StructField*("shipped_datetime", *DateType*),
when I try to show the result, it throws java.util.Date can not convert to
java.sql.Date.
so how can I solve the issue.
First I have converted cassandrascanrdd to
--
Selvam
; LinkedIn *
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.c
own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damag
its very urgent. please help me guys.
On Sun, Sep 4, 2016 at 8:05 PM, Selvam Raman <sel...@gmail.com> wrote:
> Please help me to solve the issue.
>
> spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.10:1.3.0
> --conf spark.cassandra.connection.host
ndra.DefaultSource.createRelation(DefaultSource.scala:56)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
a
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
please give me any suggestion in terms of dataframe.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
from being read as null though, it
> will only skip writing tombstones.
>
> On Thu, Aug 25, 2016, 1:23 PM Selvam Raman <sel...@gmail.com> wrote:
>
>> Hi ,
>>
>> Dataframe:
>> colA colB colC colD colE
>> 1 2 3 4 5
>> 1 2 3 null null
>> 1 null
)
Record 2:(1,2,3)
Record 3:(1,5)
Record 4:(3,4,5)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
m.invalid>
> wrote:
>
>> Hi,
>>
>> in the following Window spec I want orderBy ("") to be displayed
>> in descending order please
>>
>> val W = Window.partitionBy("col1").orderBy("col2")
>>
>> If I Do
>>
>
() *function but it gives only null values for the string same
for *to_date()* function.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such lo
qlContext.sql("select site,valudf(collect_set(requests)) as test
from sel group by site").first
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
selvam R
On Tue, Aug 9, 2016 at 4:19 PM, Selvam Raman <sel...@gmail.com> wrote:
> Example:
>
> sel1 test
> sel1 test
> sel1 ok
> sel2 ok
> sel2 test
>
>
> expected result:
>
> sel1, [test,ok]
> sel2,[test,ok]
>
> How to achieve the above re
Example:
sel1 test
sel1 test
sel1 ok
sel2 ok
sel2 test
expected result:
sel1, [test,ok]
sel2,[test,ok]
How to achieve the above result using spark dataframe.
please suggest me.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Hi Team,
how can i use spark as execution engine in sqoop2. i see the patch(S
QOOP-1532 <https://issues.apache.org/jira/browse/SQOOP-1532>) but it shows
in progess.
so can not we use sqoop on spark.
Please help me if you have an any idea.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Hi,
What is skew data.
I read that if the data was skewed while joining it would take long time to
finish the job.(99 percent finished in seconds where 1 percent of task
taking minutes to hour).
How to handle skewed data in spark.
Thanks,
Selvam R
+91-97877-87724
Hi ,
How to connect to sparkR (which is available in Linux env) using
Rstudio(Windows env).
Please help me.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
XGBoost4J could integrate with spark from 1.6 version.
Currently I am using spark 1.5.2. Can I use XGBoost instead of XGBoost4j.
Will both provides same result.
Thanks,
Selvam R
+91-97877-87724
On Mar 15, 2016 9:23 PM, "Nan Zhu" wrote:
> Dear Spark Users and
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
... 3 more
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Ql.scala:1217)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
ny file (eg. textFile()) work as well?
>
> I think this is related with this thread,
> http://apache-spark-user-list.1001560.n3.nabble.com/Error-while-running-example-scala-application-using-spark-submit-td10056.html
> .
>
>
> 2016-03-30 12:44 GMT+09:00 Selvam Raman <sel..
]
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
72 matches
Mail list logo