Running Java Program using Eclipse on Existing Spark Cluster

2016-03-09 Thread Gaini Rajeshwar
Hi All,

I have one master & 2 workers on my local machine. I wrote the following
Java program to count number of lines in README.md file (I am using Maven
project to do this)

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.api.java.JavaRDD;

import org.apache.spark.SparkConf;

import org.apache.spark.api.java.function.Function;

import org.apache.spark.api.java.function.Function2;


public class App

{

public static void main( String[] args )

{

SparkConf conf = new SparkConf()

.setAppName("Spark Java App")

.setMaster("spark://10.2.12.59:7077");

JavaSparkContext sc = new JavaSparkContext(conf);



JavaRDD lines = sc.textFile("README.md");

JavaRDD lineLengths = lines.map(s -> s.length());

int totalLength = lineLengths.reduce((a, b) -> a + b);

System.out.println("Total length: " + totalLength);



}

}


When i run the above program, i am getting the following error.


16/03/10 11:39:00 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
10.2.12.59): java.lang.ClassCastException: cannot assign instance of
java.lang.invoke.SerializedLambda to field
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.fun$1 of
type org.apache.spark.api.java.function.Function in instance of
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1

at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(
ObjectStreamClass.java:2133)

at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)

at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)

at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)



It is working fine if i use .setMaster("local[2]") instead of using
.setMaster("spark://10.2.12.59:7077")


However, it is working fine if i create a jar out of this project and
submit using spark-submit.


The problem with .setMaster("local[2]") is it is not running on the
existing cluster.


Any idea how i can run this program on existing cluster from Eclipse ?


Thanks,

Rajeshwar Gaini.


Re: Installing Spark on Mac

2016-03-09 Thread Gaini Rajeshwar
It should just work with these steps. You don't need to configure much. As
mentioned, some settings on your machine are overriding default spark
settings.

Even running as super-user should not be a problem. It works just fine as
super-user as well.

Can you tell us what version of Java you are using ? Also, can you post the
contents of each file in conf directory.



On Thu, Mar 10, 2016 at 3:28 AM, Tristan Nixon 
wrote:

> That’s very strange. I just un-set my SPARK_HOME env param, downloaded a
> fresh 1.6.0 tarball,
> unzipped it to local dir (~/Downloads), and it ran just fine - the driver
> port is some randomly generated large number.
> So SPARK_HOME is definitely not needed to run this.
>
> Aida, you are not running this as the super-user, are you?  What versions
> of Java & Scala do you have installed?
>
> > On Mar 9, 2016, at 3:53 PM, Aida Tefera  wrote:
> >
> > Hi Jakob,
> >
> > Tried running the command env|grep SPARK; nothing comes back
> >
> > Tried env|grep Spark; which is the directory I created for Spark once I
> downloaded the tgz file; comes back with PWD=/Users/aidatefera/Spark
> >
> > Tried running ./bin/spark-shell ; comes back with same error as below;
> i.e could not bind to port 0 etc.
> >
> > Sent from my iPhone
> >
> >> On 9 Mar 2016, at 21:42, Jakob Odersky  wrote:
> >>
> >> As Tristan mentioned, it looks as though Spark is trying to bind on
> >> port 0 and then 1 (which is not allowed). Could it be that some
> >> environment variables from you previous installation attempts are
> >> polluting your configuration?
> >> What does running "env | grep SPARK" show you?
> >>
> >> Also, try running just "/bin/spark-shell" (without the --master
> >> argument), maybe your shell is doing some funky stuff with the
> >> brackets.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: GroupBy on DataFrame taking too much time

2016-01-11 Thread Gaini Rajeshwar
There is no problem with the sql read. When i do the following it is
working fine.

*val dataframe1 = sqlContext.load("jdbc", Map("url" ->
"jdbc:postgresql://localhost/customerlogs?user=postgres=postgres",
"dbtable" -> "customer"))*

*dataframe1.filter("country = 'BA'").show()*

On Mon, Jan 11, 2016 at 1:41 PM, Xingchi Wang <regrec...@gmail.com> wrote:

> Error happend at the "Lost task 0.0 in stage 0.0", I think it is not the
> "groupBy" problem, it's the sql read the "customer" table issue,
> please check the jdbc link and the data is loaded successfully??
>
> Thanks
> Xingchi
>
> 2016-01-11 15:43 GMT+08:00 Gaini Rajeshwar <raja.rajeshwar2...@gmail.com>:
>
>> Hi All,
>>
>> I have a table named *customer *(customer_id, event, country,  ) in
>> postgreSQL database. This table is having more than 100 million rows.
>>
>> I want to know number of events from each country. To achieve that i am
>> doing groupBY using spark as following.
>>
>> *val dataframe1 = sqlContext.load("jdbc", Map("url" ->
>> "jdbc:postgresql://localhost/customerlogs?user=postgres=postgres",
>> "dbtable" -> "customer"))*
>>
>>
>> *dataframe1.groupBy("country").count().show()*
>>
>> above code seems to be getting complete customer table before doing
>> groupBy. Because of that reason it is throwing the following error
>>
>> *16/01/11 12:49:04 WARN HeartbeatReceiver: Removing executor 0 with no
>> recent heartbeats: 170758 ms exceeds timeout 12 ms*
>> *16/01/11 12:49:04 ERROR TaskSchedulerImpl: Lost executor 0 on 10.2.12.59
>> <http://10.2.12.59>: Executor heartbeat timed out after 170758 ms*
>> *16/01/11 12:49:04 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID
>> 0, 10.2.12.59): ExecutorLostFailure (executor 0 exited caused by one of the
>> running tasks) Reason: Executor heartbeat timed out after 170758 ms*
>>
>> I am using spark 1.6.0
>>
>> Is there anyway i can solve this ?
>>
>> Thanks,
>> Rajeshwar Gaini.
>>
>
>


Re: Getting an error while submitting spark jar

2016-01-11 Thread Gaini Rajeshwar
Hi Sree,

I think it has to be *--class mllib.perf.TestRunner* instead of *--class
mllib.perf.TesRunner*

On Mon, Jan 11, 2016 at 1:19 PM, Sree Eedupuganti  wrote:

> The way how i submitting jar
>
> hadoop@localhost:/usr/local/hadoop/spark$ ./bin/spark-submit \
> >   --class mllib.perf.TesRunner \
> >   --master spark://localhost:7077 \
> >   --executor-memory 2G \
> >   --total-executor-cores 100 \
> >   /usr/local/hadoop/spark/lib/mllib-perf-tests-assembly.jar \
> >   1000
>
> And here is my error,Spark assembly has been built with Hive, including
> Datanucleus jars on classpath
> java.lang.ClassNotFoundException: mllib.perf.TesRunner
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> hadoop@localhost:/usr/local/hadoop/spark$
>
> Thanks in Advance
>
> --
> Best Regards,
> Sreeharsha Eedupuganti
> Data Engineer
> innData Analytics Private Limited
>


XML column not supported in Database

2016-01-11 Thread Gaini Rajeshwar
Hi All,

I am using PostgreSQL database. I am using the following jdbc call to
access a customer table (*customer_id int, event text, country text,
content xml)* in my database.

*val dataframe1 = sqlContext.load("jdbc", Map("url" ->
"jdbc:postgresql://localhost/customerlogs?user=postgres=postgres",
"dbtable" -> "customer"))*

When i run above command in spark-shell i receive the following error.

*java.sql.SQLException: Unsupported type *
* at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$getCatalystType(JDBCRDD.scala:103)*
* at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$1.apply(JDBCRDD.scala:140)*
* at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$1.apply(JDBCRDD.scala:140)*
* at scala.Option.getOrElse(Option.scala:120)*
* at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:139)*
* at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91)*
* at
org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)*
* at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)*
* at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)*
* at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1153)*
* at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:25)*
* at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30)*
* at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:32)*
* at $iwC$$iwC$$iwC$$iwC$$iwC.(:34)*
* at $iwC$$iwC$$iwC$$iwC.(:36)*
* at $iwC$$iwC$$iwC.(:38)*
* at $iwC$$iwC.(:40)*
* at $iwC.(:42)*
* at (:44)*
* at .(:48)*
* at .()*
* at .(:7)*
* at .()*
* at $print()*
* at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
* at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
* at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
* at java.lang.reflect.Method.invoke(Method.java:497)*
* at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)*
* at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)*
* at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)*
* at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)*
* at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)*
* at
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)*
* at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)*
* at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)*
* at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)*
* at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)*
* at org.apache.spark.repl.SparkILoop.org
$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)*
* at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)*
* at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)*
* at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)*
* at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)*
* at org.apache.spark.repl.SparkILoop.org
$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)*
* at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)*
* at org.apache.spark.repl.Main$.main(Main.scala:31)*
* at org.apache.spark.repl.Main.main(Main.scala)*
* at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
* at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
* at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
* at java.lang.reflect.Method.invoke(Method.java:497)*
* at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)*
* at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)*
* at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)*
* at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)*
* at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)*

Is xml column type not supported yet in spark ? is there any way to fix
this issue ?

Thanks,
Rajeshwar Gaini.


Re: XML column not supported in Database

2016-01-11 Thread Gaini Rajeshwar
Hi Reynold,

I did create a issue in JIRA. It is SPARK-12764
<https://issues.apache.org/jira/browse/SPARK-12764>

On Tue, Jan 12, 2016 at 4:55 AM, Reynold Xin <r...@databricks.com> wrote:

> Can you file a JIRA ticket? Thanks.
>
> The URL is issues.apache.org/jira/browse/SPARK
>
> On Mon, Jan 11, 2016 at 1:44 AM, Gaini Rajeshwar <
> raja.rajeshwar2...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am using PostgreSQL database. I am using the following jdbc call to
>> access a customer table (*customer_id int, event text, country text,
>> content xml)* in my database.
>>
>> *val dataframe1 = sqlContext.load("jdbc", Map("url" ->
>> "jdbc:postgresql://localhost/customerlogs?user=postgres=postgres",
>> "dbtable" -> "customer"))*
>>
>> When i run above command in spark-shell i receive the following error.
>>
>> *java.sql.SQLException: Unsupported type *
>> * at
>> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$getCatalystType(JDBCRDD.scala:103)*
>> * at
>> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$1.apply(JDBCRDD.scala:140)*
>> * at
>> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$1.apply(JDBCRDD.scala:140)*
>> * at scala.Option.getOrElse(Option.scala:120)*
>> * at
>> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:139)*
>> * at
>> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91)*
>> * at
>> org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)*
>> * at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)*
>> * at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)*
>> * at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1153)*
>> * at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:25)*
>> * at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30)*
>> * at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:32)*
>> * at $iwC$$iwC$$iwC$$iwC$$iwC.(:34)*
>> * at $iwC$$iwC$$iwC$$iwC.(:36)*
>> * at $iwC$$iwC$$iwC.(:38)*
>> * at $iwC$$iwC.(:40)*
>> * at $iwC.(:42)*
>> * at (:44)*
>> * at .(:48)*
>> * at .()*
>> * at .(:7)*
>> * at .()*
>> * at $print()*
>> * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
>> * at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
>> * at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>> * at java.lang.reflect.Method.invoke(Method.java:497)*
>> * at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)*
>> * at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)*
>> * at
>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)*
>> * at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)*
>> * at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)*
>> * at
>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)*
>> * at
>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)*
>> * at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)*
>> * at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)*
>> * at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)*
>> * at org.apache.spark.repl.SparkILoop.org
>> <http://org.apache.spark.repl.SparkILoop.org>$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)*
>> * at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)*
>> * at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)*
>> * at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)*
>> * at
>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)*
>> * at org.apache.spark.repl.SparkILoop.org
>> <http://org.apache.spark.repl.SparkILoop.org>$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)*
>> * at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)*
>> * at org.apache.spark.repl.Main$.main(Main.scala:31)*
>> * at org.apache.spark.repl.Main.main(Main.scala)*
>> * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
&

Re: Unable to compile from source

2016-01-11 Thread Gaini Rajeshwar
Hey Hareesh,

Thanks. That solved the issue.

Thanks,
Rajeshwar Gaini.

On Fri, Jan 8, 2016 at 5:20 PM, hareesh makam <makamhare...@gmail.com>
wrote:

> Are you behind a proxy?
>
> Or
>
> Try disabling the SSL check while building.
>
>
> http://stackoverflow.com/questions/21252800/maven-trusting-all-certs-unlimited-java-policy
>
> Check above link to know how to disable SSL check.
>
> - hareesh.
> On Jan 8, 2016 4:54 PM, "Gaini Rajeshwar" <raja.rajeshwar2...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am new to apache spark.
>>
>> I have downloaded *Spark 1.6.0 (Jan 04 2016) source code version*.
>>
>> I did run the following command following command as per spark
>> documentation <http://spark.apache.org/docs/latest/building-spark.html>.
>>
>> build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
>> package
>>
>> When run above command, i am getting the following error
>>
>> [ERROR] Failed to execute goal on project spark-catalyst_2.10: Could not 
>> resolve dependencies for project 
>> org.apache.spark:spark-catalyst_2.10:jar:1.6.0: Failed to collect 
>> dependencies at org.codehaus.janino:janino:jar:2.7.8: Failed to read 
>> artifact descriptor for org.codehaus.janino:janino:jar:2.7.8: Could not 
>> transfer artifact org.codehaus.janino:janino:pom:2.7.8 from/to central 
>> (https://repo1.maven.org/maven2): Remote host closed connection during 
>> handshake: SSL peer shut down incorrectly -> [Help 1]
>>
>> Can anyone help with this ?
>>
>>
>>


GroupBy on DataFrame taking too much time

2016-01-10 Thread Gaini Rajeshwar
Hi All,

I have a table named *customer *(customer_id, event, country,  ) in
postgreSQL database. This table is having more than 100 million rows.

I want to know number of events from each country. To achieve that i am
doing groupBY using spark as following.

*val dataframe1 = sqlContext.load("jdbc", Map("url" ->
"jdbc:postgresql://localhost/customerlogs?user=postgres=postgres",
"dbtable" -> "customer"))*


*dataframe1.groupBy("country").count().show()*

above code seems to be getting complete customer table before doing
groupBy. Because of that reason it is throwing the following error

*16/01/11 12:49:04 WARN HeartbeatReceiver: Removing executor 0 with no
recent heartbeats: 170758 ms exceeds timeout 12 ms*
*16/01/11 12:49:04 ERROR TaskSchedulerImpl: Lost executor 0 on 10.2.12.59
: Executor heartbeat timed out after 170758 ms*
*16/01/11 12:49:04 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
10.2.12.59): ExecutorLostFailure (executor 0 exited caused by one of the
running tasks) Reason: Executor heartbeat timed out after 170758 ms*

I am using spark 1.6.0

Is there anyway i can solve this ?

Thanks,
Rajeshwar Gaini.


Unable to compile from source

2016-01-08 Thread Gaini Rajeshwar
Hi All,

I am new to apache spark.

I have downloaded *Spark 1.6.0 (Jan 04 2016) source code version*.

I did run the following command following command as per spark documentation
.

build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

When run above command, i am getting the following error

[ERROR] Failed to execute goal on project spark-catalyst_2.10: Could
not resolve dependencies for project
org.apache.spark:spark-catalyst_2.10:jar:1.6.0: Failed to collect
dependencies at org.codehaus.janino:janino:jar:2.7.8: Failed to read
artifact descriptor for org.codehaus.janino:janino:jar:2.7.8: Could
not transfer artifact org.codehaus.janino:janino:pom:2.7.8 from/to
central (https://repo1.maven.org/maven2): Remote host closed
connection during handshake: SSL peer shut down incorrectly -> [Help
1]

Can anyone help with this ?