from:"anu"

Re: unsubsribe

2018-10-30 Thread Anu B Nair

I have already send minimum 10 times! Today also I have send one!

On Tue, Oct 30, 2018 at 3:51 PM Biplob Biswas 
wrote:

> You need to send the email to user-unsubscr...@spark.apache.org and not
> to the usergroup.
>
> Thanks & Regards
> Biplob Biswas
>
>
> On Tue, Oct 30, 2018 at 10:59 AM Anu B Nair  wrote:
>
>> I am sending this Unsubscribe mail for last few months! It never happens!
>> If anyone can help us to unsubscribe it wil be really helpful!
>>
>> On Tue, Oct 30, 2018 at 3:27 PM Mohan Palavancha <
>> mohan.palavan...@gmail.com> wrote:
>>
>>>
>>>

Re: unsubsribe

2018-10-30 Thread Anu B Nair

I am sending this Unsubscribe mail for last few months! It never happens!
If anyone can help us to unsubscribe it wil be really helpful!

On Tue, Oct 30, 2018 at 3:27 PM Mohan Palavancha 
wrote:

>
>

Unsubscribe

2018-09-06 Thread Anu B Nair

Hi,

I have tried all possible way to unsubscripted from this group. Can anyone
help?

--
Anu

Hi,

I have a data set size of 10GB(example Test.txt).

I wrote my pyspark script like below(Test.py):

*from pyspark import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
spark = SparkSession.builder.appName("FilterProduct").getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sc)
lines = spark.read.text("C:/Users/test/Desktop/Test.txt").rdd
lines.collect()*

Then I am executing the above script using below command :

spark-submit Test.py --executor-memory  15G --driver-memory 15G

Then I am getting error like below:



*17/12/29 13:27:18 INFO FileScanRDD: Reading File path:
file:///C:/Users/test/Desktop/Test.txt, range: 402653184-536870912,
partition values: [empty row]
17/12/29 13:27:18 INFO CodeGenerator: Code generated in 22.743725 ms
17/12/29 13:27:44 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3230)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:383)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/12/29 13:27:44 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3230)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93*

Please let me know how to resolve this ?

--


Anu

Fwd: [pyspark][MLlib] Getting WARN FPGrowth: Input data is not cached for cached data

2017-12-21 Thread Anu B Nair

Hi,

Following is my pyspark code, (attached input sample_fpgrowth.txt and
python code along with this mail. Even after I have done cache, I am
getting Warning: Input data is not cached.

















*from pyspark.mllib.fpm import FPGrowthimport pysparkfrom pyspark.context
import SparkContextfrom pyspark.sql.session import SparkSessionsc =
SparkContext('local')data = sc.textFile("sample_fpgrowth.txt")transactions
= data.map(lambda line: line.strip().split(' ')).cache()model =
FPGrowth.train(transactions, minSupport=0.2, numPartitions=10)result =
model.freqItemsets().collect()print(result)*


Understood that it is a warning, but just wanted to know in detail

--

Anu
r z h k p
z y x w v u t s
s x o n r
x z y m t s q e
z
x z y r q t p
from pyspark.mllib.fpm import FPGrowth

import pyspark
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')


data = sc.textFile("sample_fpgrowth.txt")
transactions = data.map(lambda line: line.strip().split(' ')).cache()

model = FPGrowth.train(transactions, minSupport=0.2, numPartitions=10)

result = model.freqItemsets().collect()

print(result)

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: SparkSQL Timestamp query failure

2015-03-30 Thread anu

Hi Alessandro

Could you specify which query were you able to run successfully?

1. sqlContext.sql(SELECT * FROM Logs as l where l.timestamp = '2012-10-08
16:10:36' ).collect 

OR

2. sqlContext.sql(SELECT * FROM Logs as l where cast(l.timestamp as string)
= '2012-10-08 16:10:36.0').collect 

I am able to run only the second query, i.e. the one with timestamp casted
to string. What is the use of even parsing my data to store timestamp values
when I can't do = and = comparisons on timestamp?? 

In the above query, I am ultimately doing string comparisons, while I
actually want to do comparison on timestamp values.

*My Spark version is 1.1.0*

Please somebody clarify why am I not able to perform queries like
Select * from table1 where endTime = '2015-01-01 00:00:00' and endTime =
'2015-01-10 00:00:00' 

without getting anything in the output.

Even, the following doesn't work
Select * from table1 where endTime =  CAST('2015-01-01 00:00:00' as
timestamp) and endTime = CAST('2015-01-10 00:00:00' as timestamp)

I get the this error :  *java.lang.RuntimeException: [1.99] failure:
``STRING'' expected but identifier timestamp found*

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p22292.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Measuer Bytes READ and Peak Memory Usage for Query

2015-03-20 Thread anu

Hi All

I would like to measure Bytes Read and Peak Memory Usage for a Spark SQL
Query. 

Please clarify if Bytes Read = aggregate size of all RDDs ??
All my RDDs are in memory and 0B spill to disk.

And I am clueless how to measure Peak Memory Usage.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Measuer-Bytes-READ-and-Peak-Memory-Usage-for-Query-tp22159.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Hive on Spark with Spark as a service on CDH5.2

2015-03-17 Thread anu

*I am not clear if spark sql supports HIve on Spark when spark is run as a
service in CDH 5.2? *

Can someone please clarify this. If this is possible, how what configuration
changes have I to make to import hive context in spark shell as well as to
be able to do a spark-submit for the job to be run on the entire cluster.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Hive-on-Spark-with-Spark-as-a-service-on-CDH5-2-tp22091.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Transform a Schema RDD to another Schema RDD with a different schema

2015-03-17 Thread anu

I have a schema RDD with thw following Schema :

scala mainRDD.printSchema
root
 |-- COL1: integer (nullable = false)
 |-- COL2: integer (nullable = false)
 |-- COL3: string (nullable = true)
 |-- COL4: double (nullable = false)
 |-- COL5: string (nullable = true)



Now, I transform the mainRDD like this :


scala val sdf1 = new SimpleDateFormat(-mm-dd hh:mm:ss.SSS); val
calendar = Calendar.getInstance()
 
scala val mappedRDD : SchemaRDD = intf_ddRDD.map{ r = 
| val end_time = sdf1.parse(r(2).toString); 
| calendar.setTime(end_time); 
| val r2 = new java.sql.Timestamp(end_time.getTime); 
| val hour: Long = calendar.get(Calendar.HOUR_OF_DAY); 
| (r(0).toString.toInt, r(1).toString.toInt, r2, hour,
r(3).toString.toDouble, r(4).toString)
| }


scalamappedRDD.printSchema
root
 |-- _1: integer (nullable = false)
 |-- _2: integer (nullable = false)
 |-- _3: timestamp (nullable = true)
 |-- _4: long (nullable = false)
 |-- _5: double (nullable = false)
 |-- _6: string (nullable = true)


But the issue is, despite specifying the mainRDD as SchemaRDD, it becomes
just an RDD (notice that the column names are lost in mappedRDD)

So, how can I do the above transformation on one SchemaRDD (mainRDD) to get
another SchemaRDD (mappedRDD) with a different Schema.

Please help me out.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Transform-a-Schema-RDD-to-another-Schema-RDD-with-a-different-schema-tp22112.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Iterate over contents of schemaRDD loaded from parquet file to extract timestamp

2015-03-16 Thread anu

Spark Version - 1.1.0
Scala - 2.10.4

I have loaded following type data from a parquet file, stored in a schemaRDD

[7654321,2015-01-01 00:00:00.007,0.49,THU]

Since, in spark version 1.1.0, parquet format doesn't support saving
timestamp valuues, I have saved the timestamp data as string. Can you please
tell me how to iterate over the data in this schema RDD to retrieve the 
timestamp values and regsietr the mapped RDD as a Table and then be able to
run queries like Select * from table where time = '2015-01-01
00:00:00.000'  . I wrote the following code :

val sdf = new SimpleDateFormat(-mm-dd hh:mm:ss.SSS); val calendar =
Calendar.getInstance()
val iddRDD = intf_ddRDD.map{ r = 

val end_time = sdf.parse(r(1).toString); 
calendar.setTime(end_time); 
val r1 = new java.sql.Timestamp(end_time.getTime); 

val hour: Long = calendar.get(Calendar.HOUR_OF_DAY); 

Row(r(0).toString.toInt, r1, hour, r(2).toString.toInt, r(3).toString)

}

This gives me * org.apache.spark.SparkException: Task not serializable*

Please help !!!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Iterate-over-contents-of-schemaRDD-loaded-from-parquet-file-to-extract-timestamp-tp22089.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Optimizing SQL Query

2015-03-06 Thread anu

I have a query that's like:

Could you help in providing me pointers as to how to start to optimize it
w.r.t. spark sql:


sqlContext.sql(

SELECT dw.DAY_OF_WEEK, dw.HOUR, avg(dw.SDP_USAGE) AS AVG_SDP_USAGE

FROM (
   SELECT sdp.WID, DAY_OF_WEEK, HOUR, SUM(INTERVAL_VALUE) AS
SDP_USAGE

   FROM (
 SELECT *

 FROM date_d dd JOIN interval_f intf

 ON intf.DATE_WID = dd.WID

 WHERE intf.DATE_WID = 20141116 AND
intf.DATE_WID = 20141125 AND CAST(INTERVAL_END_TIME AS STRING) =
'2014-11-16 00:00:00.000' AND  CAST(INTERVAL_END_TIME 
   AS STRING) = '2014-11-26
00:00:00.000' AND MEAS_WID = 3

  ) test JOIN sdp_d sdp

   ON test.SDP_WID = sdp.WID

   WHERE sdp.UDC_ID = 'SP-1931201848'

   GROUP BY sdp.WID, DAY_OF_WEEK, HOUR, sdp.UDC_ID

   ) dw

GROUP BY dw.DAY_OF_WEEK, dw.HOUR)



Currently the query takes 15 minutes execution time where interval_f table
holds approx 170GB of raw data, date_d -- 170 MB and sdp_d -- 490MB 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Optimizing-SQL-Query-tp21948.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: SparkSQL Timestamp query failure

2015-03-02 Thread anu

Thank you Alessandro :)

On Tue, Mar 3, 2015 at 10:03 AM, whitebread [via Apache Spark User List] 
ml-node+s1001560n2188...@n3.nabble.com wrote:

 Anu,

 1) I defined my class Header as it follows:

 case class Header(timestamp: java.sql.Timestamp, c_ip: String,
 cs_username: String, s_ip: String, s_port: String, cs_method: String,
 cs_uri_stem: String, cs_query: String, sc_status: Int, sc_bytes: Int,
 cs_bytes: Int, time_taken: Int, User_Agent: String, Referrer: String)

 2) Defined a function to transform date to timestamp:

 implicit def date2timestamp(date: java.util.Date) = new
 java.sql.Timestamp(date.getTime)

 3) Defined the format of my timestamp

 val formatTime = new java.text.SimpleDateFormat(-MM-dd hh:mm:ss)

 4) Finally, I was able to parse my data:

 val tableMod = toProcessLogs.map(_.split( )).map(p =
 (Header(date2timestamp(formatTime3.parse(p(0)+ +p(1))),p(2), p(3), p(4),
 p(5), p(6), p(7), p(8), p(9).trim.toInt, p(10).trim.toInt,
 p(11).trim.toInt, p(12).trim.toInt, p(13), p(14

 Hope this helps,

 Alessandro

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p21884.html
  To unsubscribe from SparkSQL Timestamp query failure, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=19502code=YW5hbWlrYS5ndW9wdGFAZ21haWwuY29tfDE5NTAyfDE1MjUxMDc5MQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p21885.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SparkSQL Timestamp query failure

2015-03-02 Thread anu

Can you please post how did you overcome this issue.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p21868.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Facing error: java.lang.ArrayIndexOutOfBoundsException while executing SparkSQL join query

2015-02-27 Thread anu

I have three tables with the following schema:

case class *date_d*(WID: Int, CALENDAR_DATE: java.sql.Timestamp,
DATE_STRING: String, DAY_OF_WEEK: String, DAY_OF_MONTH: Int, DAY_OF_YEAR:
Int, END_OF_MONTH_FLAG: String, YEARWEEK: Int, CALENDAR_MONTH: String,
MONTH_NUM: Int, YEARMONTH: Int, QUARTER: Int, YEAR: Int)



case class *interval_f*(ORG_ID: Int, CHANNEL_WID: Int, SDP_WID: Int,
MEAS_WID: Int, DATE_WID: Int, TIME_WID: Int, VALIDATION_STATUS_CD: Int,
VAL_FAIL_CD:Int, INTERVAL_FLAG_CD: Int, CHANGE_METHOD_WID:Int,
SOURCE_LAST_UPD_TIME: java.sql.Timestamp, INTERVAL_END_TIME:
java.sql.Timestamp, LOCKED: String, EXT_VERSION_TIME: java.sql.Timestamp,
INTERVAL_VALUE: Double, INSERT_TIME: java.sql.Timestamp, LAST_UPD_TIME:
java.sql.Timestamp)



class * sdp_d*( WID :Option[Int], BATCH_ID :Option[Int], SRC_ID
:Option[String], ORG_ID :Option[Int], CLASS_WID :Option[Int], DESC_TEXT
:Option[String], PREMISE_WID :Option[Int], FEED_LOC :Option[String], GPS_LAT
:Option[Double], GPS_LONG :Option[Double], PULSE_OUTPUT_BLOCK
:Option[String], UDC_ID :Option[String], UNIVERSAL_ID :Option[String],
IS_VIRTUAL_FLG :Option[String], SEAL_INFO :Option[String], ACCESS_INFO
:Option[String], ALT_ACCESS_INFO :Option[String], LOC_INFO :Option[String],
ALT_LOC_INFO :Option[String], TYPE :Option[String], SUB_TYPE
:Option[String], TIMEZONE_ID :Option[Int], GIS_ID :Option[String],
BILLED_UPTO_TIME :Option[java.sql.Timestamp], POWER_STATUS :Option[String],
LOAD_STATUS :Option[String], BILLING_HOLD_STATUS :Option[String],
INSERT_TIME :Option[java.sql.Timestamp], LAST_UPD_TIME
:Option[java.sql.Timestamp]) extends Product{

@throws(classOf[IndexOutOfBoundsException])
override def productElement(n: Int) = n match
{
case 0 = WID; case 1 = BATCH_ID; case 2 = SRC_ID; case 3 =
ORG_ID; case 4 = CLASS_WID; case 5 = DESC_TEXT; case 6 = PREMISE_WID;
case 7 = FEED_LOC; case 8 = GPS_LAT; case 9 = GPS_LONG; case 10 =
PULSE_OUTPUT_BLOCK; case 11 = UDC_ID; case 12 = UNIVERSAL_ID; case 13 =
IS_VIRTUAL_FLG; case 14 = SEAL_INFO; case 15 = ACCESS_INFO; case 16 =
ALT_ACCESS_INFO; case 17 = LOC_INFO; case 18 = ALT_LOC_INFO; case 19 =
TYPE; case 20 = SUB_TYPE; case 21 = TIMEZONE_ID; case 22 = GIS_ID; case
23 = BILLED_UPTO_TIME; case 24 = POWER_STATUS; case 25 = LOAD_STATUS;
case 26 = BILLING_HOLD_STATUS; case 27 = INSERT_TIME; case 28 =
LAST_UPD_TIME; case _ = throw new IndexOutOfBoundsException(n.toString())
}

override def productArity: Int = 29; override def canEqual(that: Any):
Boolean = that.isInstanceOf[sdp_d]
}



Non-join queries work fine:

*val q1 = sqlContext.sql(SELECT YEAR, DAY_OF_YEAR, MAX(WID), MIN(WID),
COUNT(*) FROM date_d GROUP BY YEAR, DAY_OF_YEAR ORDER BY YEAR,
DAY_OF_YEAR)*

res4: Array[org.apache.spark.sql.Row] =
Array([2014,305,20141101,20141101,1], [2014,306,20141102,20141102,1],
[2014,307,20141103,20141103,1], [2014,308,20141104,20141104,1],
[2014,309,20141105,20141105,1], [2014,310,20141106,20141106,1],
[2014,311,20141107,20141107,1], [2014,312,20141108,20141108,1],
[2014,313,20141109,20141109,1], [2014,314,20141110,20141110,1],
[2014,315,2014,2014,1], [2014,316,20141112,20141112,1],
[2014,317,20141113,20141113,1], [2014,318,20141114,20141114,1],
[2014,319,20141115,20141115,1], [2014,320,20141116,20141116,1],
[2014,321,20141117,20141117,1], [2014,322,20141118,20141118,1],
[2014,323,20141119,20141119,1], [2014,324,20141120,20141120,1],
[2014,325,20141121,20141121,1], [2014,326,20141122,20141122,1],
[2014,327,20141123,20141123,1], [2014,328,20141...



*But the join queries throw this error:
java.lang.ArrayIndexOutOfBoundsException*

*scala val q = sqlContext.sql(select * from date_d dd join interval_f
intf on intf.DATE_WID = dd.WID Where intf.DATE_WID = 20141101 AND
intf.DATE_WID = 20141110)*

q: org.apache.spark.sql.SchemaRDD =
SchemaRDD[38] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Project
[WID#0,CALENDAR_DATE#1,DATE_STRING#2,DAY_OF_WEEK#3,DAY_OF_MONTH#4,DAY_OF_YEAR#5,END_OF_MONTH_FLAG#6,YEARWEEK#7,CALENDAR_MONTH#8,MONTH_NUM#9,YEARMONTH#10,QUARTER#11,YEAR#12,ORG_ID#13,CHANNEL_WID#14,SDP_WID#15,MEAS_WID#16,DATE_WID#17,TIME_WID#18,VALIDATION_STATUS_CD#19,VAL_FAIL_CD#20,INTERVAL_FLAG_CD#21,CHANGE_METHOD_WID#22,SOURCE_LAST_UPD_TIME#23,INTERVAL_END_TIME#24,LOCKED#25,EXT_VERSION_TIME#26,INTERVAL_VALUE#27,INSERT_TIME#28,LAST_UPD_TIME#29]
 ShuffledHashJoin [WID#0], [DATE_WID#17], BuildRight
  Exchange (HashPartitioning [WID#0], 200)
   InMemoryColumnarTableScan
[WID#0,CALENDAR_DATE#1,DATE_STRING#2,DAY_OF_WEEK#3,DAY_OF_MONTH#4,DAY_OF_YEAR#5,END_OF_MONTH_FLA...


*scala q.take(5).foreach(println)*

15/02/27 15:50:26 INFO SparkContext: Starting job: runJob at
basicOperators.scala:136
15/02/27 15:50:26 INFO DAGScheduler: Registering RDD 46 (mapPartitions at
Exchange.scala:48)
15/02/27 15:50:26 INFO FileInputFormat: Total input paths to process : 1
15/02/27 15:50:26 INFO DAGScheduler: Registering RDD 42 (mapPartitions at
Exchange.scala:48)
15/02/27 15:50:26 INFO DAGScheduler: Got job 2

Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell

2015-02-24 Thread anu

My issue is posted here on stack-overflow. What am I doing wrong here?

http://stackoverflow.com/questions/28689186/facing-error-while-extending-scala-class-with-product-interface-to-overcome-limi




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Facing-error-while-extending-scala-class-with-Product-interface-to-overcome-limit-of-22-fields-in-spl-tp21787.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: unsubsribe

Re: unsubsribe

Unsubscribe

Unsubscribe

Unsubscribe

Unsubscribe

unsubscribe

Java heap space OutOfMemoryError in pyspark spark-submit (spark version:2.2)

Fwd: [pyspark][MLlib] Getting WARN FPGrowth: Input data is not cached for cached data

Re: SparkSQL Timestamp query failure

Measuer Bytes READ and Peak Memory Usage for Query

Hive on Spark with Spark as a service on CDH5.2

Transform a Schema RDD to another Schema RDD with a different schema

Iterate over contents of schemaRDD loaded from parquet file to extract timestamp

Optimizing SQL Query

Re: SparkSQL Timestamp query failure

Re: SparkSQL Timestamp query failure

Facing error: java.lang.ArrayIndexOutOfBoundsException while executing SparkSQL join query

Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell

19 matches

Site Navigation

Mail list logo

Footer information