from:"Hafiz Mujadid"

reading files with .list extension

2016-10-15 Thread Hafiz Mujadid

hi,


I want to load the files in apache-spark with .list extensions as
actors.list.gz here . Can
anybody please suggest me the Hadoop input format for such kind of files?


Thanks

Multiple-streaming context within a jvm

2016-10-05 Thread Hafiz Mujadid

Hi,

I am trying to use multiple streaming context in one spark job. I want to
fetch users data from users topic of Kafka and purchase data from purchases
topic of Kafka using streaming. Then I want to join the data and perform
some operations on this data. But, I read that spark does not allow
multiple streaming context in a JVM. So, How can I implement this use-case?


Thanks

java.net.URISyntaxException

2016-10-04 Thread Hafiz Mujadid

Hi,

I am trying example of structured streaming in spark using following piece
of code,

val spark = SparkSession
.builder
.appName("testingSTructuredQuery")
.master("local")
.getOrCreate()
import spark.implicits._
val userSchema = new StructType()
.add("name", "string").add("age", "integer")

val csvDF = spark
.readStream
.option("sep", ",")
.schema(userSchema) // Specify schema of the parquet files
.csv("hdfs://192.168.23.107:9000/structuredStreaming/")
csvDF.show

When I run this piece of code, following exception is raised.

Exception in thread "main" java.lang.IllegalArgumentException:
java.net.URISyntaxException: Relative path in absolute URI:
file:E:/Scala-Eclips/workspace/spark2/spark-warehouse
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.(Path.java:172)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:114)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.(SessionCatalog.scala:89)
at
org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
at
org.apache.spark.sql.internal.SessionState$$anon$1.(SessionState.scala:112)
at
org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
at
org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
at
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:142)
at
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:153)
at
org.apache.spark.sql.streaming.DataStreamReader.csv(DataStreamReader.scala:251)
at com.platalytics.spark.two.test.App$.main(App.scala:22)
at com.platalytics.spark.two.test.App.main(App.scala)


Please guide me in this regard.

Thanks

State management in spark-streaming

2015-12-06 Thread Hafiz Mujadid

Hi,

I have spark streaming with mqtt as my source. There are continuous events
of flame sensors i.e. Fire and no Fire. I want to generate Fire event when
the newly event is for Fire and want to ignore all subsequent event until No
fire event is happened. Similarly If i get No-Fire Event i will ignore all
subsequent No-Fire events and will wait for Fire Events. 

Is it possible to handle this logic in custom receiver? If yes how can we
remember last state i.e. Fire/No-Fire in receiver.


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/State-management-in-spark-streaming-tp25608.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Issue with Spark-redshift

2015-11-12 Thread Hafiz Mujadid

Hi all! 
I am trying to read data from redshift table using  spark-redshift
   project.

Here is my code
val conf = new SparkConf().setAppName("TestApp").setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df1: DataFrame = sqlContext.read
.format("com.databricks.spark.redshift")
.option("url",
"jdbc:redshift://host.us-east-1.redshift.amazonaws.com:5439/test?user=uname&password=pwd")
.option("dbtable", "users")
.option("tempdir", "s3n://accesskey:secretkey@redhsift-storage/test")
.load()
df1.show()


When i run this code i get following error

WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle
configuration
java.lang.NullPointerException
at
com.databricks.spark.redshift.Utils$.checkThatBucketHasObjectLifecycleConfiguration(Utils.scala:76)
at
com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:76)
at
org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:50)
at
org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:50)
at
org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:266)
at
org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:265)
at
org.apache.spark.sql.sources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:296)
at
org.apache.spark.sql.sources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:261)
at
org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:46)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at
org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:314)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:943)
at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:941)
at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:947)
at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:947)
at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1269)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1203)
at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1262)
at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:176)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:331)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:338)
at test.TestClass$.main(TestClass.scala:21)
at test.TestClass.main(TestClass.scala)
Exception in thread "main" java.net.URISyntaxException: Relative path in
absolute URI:
s3n://redhsift-storage%5Ctest%5Cf19a1019-5b20-41e5-a8df-b6490426eb64
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.(URI.java:679)
at java.net.URI.(URI.java:781)
at com.databricks.spark.redshift.Utils$.joinUrls(Utils.scala:47)
at com.databricks.spark.redshift.Utils$.makeTempPath(Utils.scala:61)
at
com.databricks.spark.redshift.Parameters$MergedParameters.createPerQueryTempDir(Parameters.scala:78)



any suggestion?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-with-Spark-redshift-tp25370.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

writing to hive

2015-10-13 Thread Hafiz Mujadid

hi!

I am following  this
 
 
tutorial to read and write from hive. But i am facing following exception
when i run the code.

15/10/12 14:57:36 INFO storage.BlockManagerMaster: Registered BlockManager
15/10/12 14:57:38 INFO scheduler.EventLoggingListener: Logging events to
hdfs://host:9000/spark/logs/local-1444676256555
Exception in thread "main" java.lang.VerifyError: Bad return type
Exception Details:
  Location:
   
org/apache/spark/sql/catalyst/expressions/Pmod.inputType()Lorg/apache/spark/sql/types/AbstractDataType;
@3: areturn
  Reason:
Type 'org/apache/spark/sql/types/NumericType$' (current frame, stack[0])
is not assignable to 'org/apache/spark/sql/types/AbstractDataType' (from
method signature)
  Current Frame:
bci: @3
flags: { }
locals: { 'org/apache/spark/sql/catalyst/expressions/Pmod' }
stack: { 'org/apache/spark/sql/types/NumericType$' }
  Bytecode:
000: b200 63b0

at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.getDeclaredConstructor(Class.java:2066)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$4.apply(FunctionRegistry.scala:267)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$4.apply(FunctionRegistry.scala:267)
at scala.util.Try$.apply(Try.scala:161)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.expression(FunctionRegistry.scala:267)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.(FunctionRegistry.scala:148)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.(FunctionRegistry.scala)
at
org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:414)
at
org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:413)
at
org.apache.spark.sql.UDFRegistration.(UDFRegistration.scala:39)
at org.apache.spark.sql.SQLContext.(SQLContext.scala:203)
at
org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:72)


Is there any suggestion how to read and write in hive?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/writing-to-hive-tp25046.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

read from hive tables and write back to hive

2015-10-12 Thread Hafiz Mujadid

Hi!

How can i read/write data from/to hive?
Is it necessary to compile spark with hive profile to interact with hive?
which maven dependencies are required to interact with hive?

i could not find a well documentation to follow step by step to get working
with hive.

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/read-from-hive-tables-and-write-back-to-hive-tp25028.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Hive with apache spark

2015-10-11 Thread Hafiz Mujadid

Hi

how can we read data from external hive server. Hive server is running and I
want to read data remotely using spark. is there any example ?


thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Hive-with-apache-spark-tp25020.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

wild cards in spark sql

2015-09-02 Thread Hafiz Mujadid

Hi

does spark sql support wild cards to filter data in sql queries just like we
can filter data in sql queries in RDBMS with different wild cards like % and
? etc. In other words how can i write following query in spar sql

select * from employee where ename like 'a%d'

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wild-cards-in-spark-sql-tp24563.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Save dataframe into hbase

2015-09-02 Thread Hafiz Mujadid

Hi 

What is the efficient way to save Dataframe into hbase?

Thanks






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Save-dataframe-into-hbase-tp24552.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

reading multiple parquet file using spark sql

2015-09-01 Thread Hafiz Mujadid

Hi 

I want to read multiple parquet files using spark sql load method. just like
we can pass multiple comma separated path to sc.textfile method. Is ther
anyway to do the same ?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/reading-multiple-parquet-file-using-spark-sql-tp24537.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Schema From parquet file

2015-09-01 Thread Hafiz Mujadid

Hi all!

Is there any way to get schema from a parquet file without loading into
dataframe?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Schema-From-parquet-file-tp24535.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Is there any way to connect cassandra without spark-cassandra connector?

2015-08-27 Thread Hafiz Mujadid

What maven dependencies are you using ? I tried the same but got following
exception

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/cassandra/cql/jdbc/AbstractJdbcType
at
org.apache.cassandra.cql.jdbc.CassandraConnection.(CassandraConnection.java:146)
at
org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:92)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:96)
at org.apache.spark.sql.jdbc.JDBCRelation.(JDBCRelation.scala:133)
at
org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
at
com.platalytics.dataFrame.exercise.CassandraJDBC$.main(CassandraJDBC.scala:15)
at
com.platalytics.dataFrame.exercise.CassandraJDBC.main(CassandraJDBC.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.cassandra.cql.jdbc.AbstractJdbcType







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-any-way-to-connect-cassandra-without-spark-cassandra-connector-tp24472p24482.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Writing test case for spark streaming checkpointing

2015-08-27 Thread Hafiz Mujadid

Hi!

I have enables check pointing in spark streaming with kafka. I can see that
spark streaming is checkpointing to the mentioned directory at hdfs. How can
i test that it works fine and recover with no data loss ?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-test-case-for-spark-streaming-checkpointing-tp24475.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

giving offset in spark sql

2015-08-04 Thread Hafiz Mujadid

Hi all!

I want to skip first n rows from a dataframe? This is done in normal sql
using offset keyword. How can we achieve in spark sql?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/giving-offset-in-spark-sql-tp24130.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Custom partitioner

2015-07-26 Thread Hafiz Mujadid

Hi

I have csv data in which i have a column of date time. I want to partition
my data in 12 partitions with each partition containing data of one month
only. I am not getting how to write such partitioner and how to use that
partitioner to read write data.  

Kindly help me in this regard.

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Custom-partitioner-tp24001.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Converting spark JDBCRDD to DataFrame

2015-07-06 Thread Hafiz Mujadid

Hi all!

what is the most efficient way to convert jdbcRDD to DataFrame.

any example?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Converting-spark-JDBCRDD-to-DataFrame-tp23647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException in spark with mysql database

2015-07-06 Thread Hafiz Mujadid

Hi!
I am trying to load data from my sql database using following code

val query="select * from  "+table+"  "
val url = "jdbc:mysql://" + dataBaseHost + ":" + dataBasePort + "/" +
dataBaseName + "?user=" + db_user + "&password=" + db_pass
val sc = new SparkContext(new
SparkConf().setAppName("SparkJdbcDs").setMaster("local[*]"))
val sqlContext = new SQLContext(sc)
val options = new HashMap[String, String]()
options.put("driver", "com.mysql.jdbc.Driver")
options.put("url", url)
options.put("dbtable", query)
options.put("numPartitions", "1")
sqlContext.load("jdbc", options)

And I get following exception

Exception in thread "main"
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error
in your SQL syntax; check the manual that corresponds to your MySQL server
version for the right syntax to use near 'select * from  tempTable   WHERE
1=0'



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/com-mysql-jdbc-exceptions-jdbc4-MySQLSyntaxErrorException-in-spark-with-mysql-database-tp23643.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: lower and upper offset not working in spark with mysql database

2015-07-05 Thread Hafiz Mujadid

thanks

On Mon, Jul 6, 2015 at 10:46 AM, Manohar753 [via Apache Spark User List] <
ml-node+s1001560n23637...@n3.nabble.com> wrote:

>  I think you should mention partitionColumn like below and the Colum type
> should be numeric. It works for my case.
>
>
>
> options.put("partitionColumn", "revision");
>
>
>
>
>
> Thanks,
>
> Manohar
>
>
>
>
>
> *From:* Hafiz Mujadid [via Apache Spark User List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=23637&i=0>]
> *Sent:* Monday, July 6, 2015 10:56 AM
> *To:* Manohar Reddy
> *Subject:* lower and upper offset not working in spark with mysql database
>
>
>
> Hi all!
>
> I am trying to read records from offset 100 to 110 from a table using
> following piece of code
>
> val sc = new SparkContext(new
> SparkConf().setAppName("SparkJdbcDs").setMaster("local[*]"))
> val sqlContext = new SQLContext(sc)
> val options = new HashMap[String, String]()
> options.put("driver", "com.mysql.jdbc.Driver")
> options.put("url",
> "jdbc:mysql://***:3306/temp?user=&password=")
> options.put("dbtable", "tempTable")
> options.put("lowerBound", "100")
> options.put("upperBound", "110")
> options.put("numPartitions", "1")
> sqlContext.load("jdbc", options)
>
>
> but this returns all the records instead of only 10 records
>
>  --
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
>
> http://apache-spark-user-list.1001560.n3.nabble.com/lower-and-upper-offset-not-working-in-spark-with-mysql-database-tp23635.html
>
> To start a new topic under Apache Spark User List, email [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23637&i=1>
> To unsubscribe from Apache Spark User List, click here.
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>   --
> Happiest Minds Disclaimer
>
> This message is for the sole use of the intended recipient(s) and may
> contain confidential, proprietary or legally privileged information. Any
> unauthorized review, use, disclosure or distribution is prohibited. If you
> are not the original intended recipient of the message, please contact the
> sender by reply email and destroy all copies of the original message.
> Happiest Minds Technologies <http://www.happiestminds.com>
>
> --
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/lower-and-upper-offset-not-working-in-spark-with-mysql-database-tp23635p23637.html
>  To unsubscribe from lower and upper offset not working in spark with
> mysql database, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=23635&code=aGFmaXptdWphZGlkMDBAZ21haWwuY29tfDIzNjM1fC05MjEzOTMxMTE=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Regards: HAFIZ MUJADID




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/lower-and-upper-offset-not-working-in-spark-with-mysql-database-tp23635p23638.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

lower and upper offset not working in spark with mysql database

2015-07-05 Thread Hafiz Mujadid

Hi all!

I am trying to read records from offset 100 to 110 from a table using
following piece of code

val sc = new SparkContext(new
SparkConf().setAppName("SparkJdbcDs").setMaster("local[*]"))
val sqlContext = new SQLContext(sc)
val options = new HashMap[String, String]()
options.put("driver", "com.mysql.jdbc.Driver")
options.put("url",
"jdbc:mysql://***:3306/temp?user=&password=")
options.put("dbtable", "tempTable")
options.put("lowerBound", "100")
options.put("upperBound", "110")
options.put("numPartitions", "1")
sqlContext.load("jdbc", options)


but this returns all the records instead of only 10 records




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/lower-and-upper-offset-not-working-in-spark-with-mysql-database-tp23635.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: making dataframe for different types using spark-csv

2015-07-02 Thread Hafiz Mujadid

Thanks

On Thu, Jul 2, 2015 at 5:40 PM, Kohler, Curt E (ELS-STL) <
c.koh...@elsevier.com> wrote:

>  You should be able to do something like this (assuming an input file
> formatted as:  String, IntVal, LongVal)
>
>
>  import org.apache.spark.sql.types._
>
>  val recSchema = StructType(List(StructField(“strVal", StringType, false),
> StructField(“intVal", IntegerType,
> false),
> StructField(“longVal", LongType, false)))
>
>  val filePath = “some path to your dataset"
>
>  val df1 =  sqlContext.load("com.databricks.spark.csv", recSchema,
> Map("path" -> filePath , "header" -> "false", "delimiter" -> ",", "mode" ->
> "FAILFAST"))
>
>   From: Hafiz Mujadid 
> Date: Wednesday, July 1, 2015 at 10:59 PM
> To: Mohammed Guller 
> Cc: Krishna Sankar , "user@spark.apache.org" <
> user@spark.apache.org>
>
> Subject: Re: making dataframe for different types using spark-csv
>
>   hi Mohammed Guller!
>
>  How can I specify schema in load method?
>
>
>
> On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller 
> wrote:
>
>>  Another option is to provide the schema to the load method. One variant
>> of the sqlContext.load takes a schema as a input parameter. You can define
>> the schema programmatically as shown here:
>>
>>
>>
>>
>> https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* Krishna Sankar [mailto:ksanka...@gmail.com]
>> *Sent:* Wednesday, July 1, 2015 3:09 PM
>> *To:* Hafiz Mujadid
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: making dataframe for different types using spark-csv
>>
>>
>>
>> ·  use .cast("...").alias('...') after the DataFrame is read.
>>
>> ·  sql.functions.udf for any domain-specific conversions.
>>
>> Cheers
>>
>> 
>>
>>
>>
>> On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid 
>> wrote:
>>
>> Hi experts!
>>
>>
>> I am using spark-csv to lead csv data into dataframe. By default it makes
>> type of each column as string. Is there some way to get dataframe of
>> actual
>> types like int,double etc.?
>>
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>>
>
>
>
>  --
> Regards: HAFIZ MUJADID
>



-- 
Regards: HAFIZ MUJADID

Re: making dataframe for different types using spark-csv

2015-07-01 Thread Hafiz Mujadid

hi Mohammed Guller!

How can I specify schema in load method?



On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller 
wrote:

>  Another option is to provide the schema to the load method. One variant
> of the sqlContext.load takes a schema as a input parameter. You can define
> the schema programmatically as shown here:
>
>
>
>
> https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
>
>
>
> Mohammed
>
>
>
> *From:* Krishna Sankar [mailto:ksanka...@gmail.com]
> *Sent:* Wednesday, July 1, 2015 3:09 PM
> *To:* Hafiz Mujadid
> *Cc:* user@spark.apache.org
> *Subject:* Re: making dataframe for different types using spark-csv
>
>
>
> ·  use .cast("...").alias('...') after the DataFrame is read.
>
> ·  sql.functions.udf for any domain-specific conversions.
>
> Cheers
>
> 
>
>
>
> On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid 
> wrote:
>
> Hi experts!
>
>
> I am using spark-csv to lead csv data into dataframe. By default it makes
> type of each column as string. Is there some way to get dataframe of actual
> types like int,double etc.?
>
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -----
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>



-- 
Regards: HAFIZ MUJADID

making dataframe for different types using spark-csv

2015-07-01 Thread Hafiz Mujadid

Hi experts!


I am using spark-csv to lead csv data into dataframe. By default it makes
type of each column as string. Is there some way to get dataframe of actual
types like int,double etc.?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

coalesce on dataFrame

2015-07-01 Thread Hafiz Mujadid

How can we use coalesce(1, true) on dataFrame?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-on-dataFrame-tp23564.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

flume sinks supported by spark streaming

2015-06-23 Thread Hafiz Mujadid

Hi!


I want to integrate flume with spark streaming. I want to know which sink
type of flume are supported by spark streaming? I saw one example using
avroSink.

Thanks





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/flume-sinks-supported-by-spark-streaming-tp23462.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

cassandra with jdbcRDD

2015-06-16 Thread Hafiz Mujadid

hi all!


is there a way to connect cassandra with jdbcRDD ?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/cassandra-with-jdbcRDD-tp23335.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

redshift spark

2015-06-05 Thread Hafiz Mujadid

Hi All,

I want to read and write data to aws redshift. I found spark-redshift
project at following address.
https://github.com/databricks/spark-redshift

in its documentation there is following code is written. 
import com.databricks.spark.redshift.RedshiftInputFormat

val records = sc.newAPIHadoopFile(
  path,
  classOf[RedshiftInputFormat],
  classOf[java.lang.Long],
  classOf[Array[String]])

I am unable to understand it's parameters. Can somebody explain how to use
this? what is meant by path in this case?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/redshift-spark-tp23175.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

setting spark configuration properties problem

2015-05-05 Thread Hafiz Mujadid

Hi all,

i have declared spark context at start of my program and then i want to
change it's configurations at some later stage in my code as written below

val conf = new SparkConf().setAppName("Cassandra Demo")
var sc:SparkContext=new SparkContext(conf)
sc.getConf.set("spark.cassandra.connection.host", host)
println(sc.getConf.get("spark.cassandra.connection.host"))

in the above code following exception occur

Exception in thread "main" java.util.NoSuchElementException:
spark.cassandra.connection.host

Actually I want to set cassandra host property and i just want to set this
if there is some job related to cassandra so i declare sparkcontext earlier
and then want to set this property at some later stage.

Any suggestion?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/setting-spark-configuration-properties-problem-tp22764.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

empty jdbc RDD in spark

2015-05-02 Thread Hafiz Mujadid

Hi all!
I am trying to read hana database using spark jdbc RDD
here is my code
def readFromHana() {
val conf = new SparkConf()
conf.setAppName("test").setMaster("local")
val sc = new SparkContext(conf)
val rdd = new JdbcRDD(sc, () => {
  Class.forName("com.sap.db.jdbc.Driver").newInstance()
 
DriverManager.getConnection("jdbc:sap://54.69.200.113:30015/?currentschema=LIVE2",
"mujadid", "786Xyz123")
},
  "SELECT *  FROM MEMBERS LIMIT ? OFFSET  ?",
  0, 100, 1,
  (r: ResultSet) =>  convert(r) )
println(rdd.count());
sc.stop()
  }
  def convert(rs: ResultSet):String={
  val rsmd = rs.getMetaData()
  val numberOfColumns = rsmd.getColumnCount()
  var i = 1
  val row=new StringBuilder
  while (i <= numberOfColumns) {
row.append( rs.getString(i)+",")
i += 1
  }
  row.toString()
   }

The resultant count is 0

Any suggestion?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/empty-jdbc-RDD-in-spark-tp22736.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

sap hana database laod using jdbcRDD

2015-04-30 Thread Hafiz Mujadid

Hi !

Can we load hana database table using spark jdbc RDD?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/sap-hana-database-laod-using-jdbcRDD-tp22726.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

saving schemaRDD to cassandra

2015-03-27 Thread Hafiz Mujadid

Hi experts!

I would like to know is there anyway to store schemaRDD to cassandra?
if yes then how to store in existing cassandra column family and new column
family?

Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/saving-schemaRDD-to-cassandra-tp22256.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Downloading data from url

2015-03-17 Thread Hafiz Mujadid

Hi experts!


Is there any api in spark to download data from url? I want to download data
from url in a spark application. I want to get downloading on all nodes
instead of a single node.

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Downloading-data-from-url-tp22102.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

connecting spark application with SAP hana

2015-03-12 Thread Hafiz Mujadid

Hi experts!

Is there any way to connect SAP hana in spark application and get data from
hana tables in our spark application?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/connecting-spark-application-with-SAP-hana-tp22011.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

spark streaming, batchinterval,windowinterval and window sliding interval difference

2015-02-26 Thread Hafiz Mujadid

Can somebody explain the difference between 
batchinterval,windowinterval and window sliding interval with example.
If there is any real time use case of using these parameters?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-batchinterval-windowinterval-and-window-sliding-interval-difference-tp21829.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

running spark project using java -cp command

2015-02-09 Thread Hafiz Mujadid

hi experts!

Is there any way to run spark application using java -cp command ?


thanks 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/running-spark-project-using-java-cp-command-tp21567.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

using spark in web services

2015-02-09 Thread Hafiz Mujadid

Hi experts! I am trying to use spark in my restful webservices.I am using
scala lift frramework for writing web services. Here is my boot class
class Boot extends Bootable {
  def boot {
Constants.loadConfiguration
val sc=new SparkContext(new
SparkConf().setMaster("local").setAppName("services"))
// Binding Service as a Restful API
LiftRules.statelessDispatchTable.append(RestfulService);
// resolve the trailing slash issue
LiftRules.statelessRewrite.prepend({
  case RewriteRequest(ParsePath(path, _, _, true), _, _) if path.last ==
"index" => RewriteResponse(path.init)
})

  }
}


When i remove this line val sc=new SparkContext(new
SparkConf().setMaster("local").setAppName("services"))

then it works fine. 
I am starting services using command 

java -jar start.jar jetty.port=

and get following exception 


ERROR net.liftweb.http.provider.HTTPProvider - Failed to Boot! Your
application may not run properly
java.lang.NoClassDefFoundError:
org/eclipse/jetty/server/handler/ContextHandler$NoContext
at
org.eclipse.jetty.servlet.ServletContextHandler.newServletHandler(ServletContextHandler.java:260)
at
org.eclipse.jetty.servlet.ServletContextHandler.getServletHandler(ServletContextHandler.java:322)
at
org.eclipse.jetty.servlet.ServletContextHandler.relinkHandlers(ServletContextHandler.java:198)
at
org.eclipse.jetty.servlet.ServletContextHandler.(ServletContextHandler.java:157)
at
org.eclipse.jetty.servlet.ServletContextHandler.(ServletContextHandler.java:135)
at
org.eclipse.jetty.servlet.ServletContextHandler.(ServletContextHandler.java:129)
at
org.eclipse.jetty.servlet.ServletContextHandler.(ServletContextHandler.java:99)
at
org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:96)
at
org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:87)
at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:67)
at
org.apache.spark.ui.WebUI$$anonfun$attachTab$1.apply(WebUI.scala:60)
at
org.apache.spark.ui.WebUI$$anonfun$attachTab$1.apply(WebUI.scala:60)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.ui.WebUI.attachTab(WebUI.scala:60)
at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:50)
at org.apache.spark.ui.SparkUI.(SparkUI.scala:63)



Any suggestion please?

Am I using right command to run this ?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-in-web-services-tp21550.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

LeaseExpiredException while writing schemardd to hdfs

2015-02-03 Thread Hafiz Mujadid

I want to write whole schemardd to single in hdfs but facing following
exception

rg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /test/data/data1.csv (inode 402042): File does not exist. Holder
DFSClient_NONMAPREDUCE_-564238432_57 does not have any open files 

here is my code 
rdd.foreachPartition( iterator => {
  var output = new Path( outputpath )
  val fs = FileSystem.get( new Configuration() )
  var writer : BufferedWriter = null
  writer = new BufferedWriter( new OutputStreamWriter(  fs.create(
output ) ) ) 
  var line = new StringBuilder
  iterator.foreach( row => {
row.foreach( column => {
line.append( column.toString + splitter )
} )
writer.write( line.toString.dropRight( 1 ) )
writer.newLine()
line.clear
} )
writer.close()
} )

I think problem is that I am making writer for each partition and multiple
writer are executing in parallel so when they try to write to same file then
this problem appears. 
When I avoid this approach then I face task not serializable exception

Any suggest to handle this problem?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/LeaseExpiredException-while-writing-schemardd-to-hdfs-tp21477.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

spark streaming kinesis issue

2015-01-20 Thread Hafiz Mujadid

Hi experts!

I am using spark streaming with kinesis and getting this exception while
running program 

 java.lang.LinkageError: loader (instance of 
org/apache/spark/executor/ChildExecutorURLClassLoader$userClassLoader$):
attempted  duplicate class definition for name:
"com/amazonaws/services/kinesis/clientlibrary/lib/worker/InitialPositionInStream"
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)



Is there any solution to this problem?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kinesis-issue-tp21262.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Linkage error - duplicate class definition

2015-01-20 Thread Hafiz Mujadid

Have you solved this problem?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Linkage-error-duplicate-class-definition-tp9482p21260.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain

2015-01-19 Thread Hafiz Mujadid

Hi all!

I am trying to use kinesis and spark streaming together. So when I execute
program I get exception com.amazonaws.AmazonClientException: Unable to load
AWS credentials from any provider in the chain


Here is my piece of code

val credentials = new
BasicAWSCredentials(KinesisProperties.AWS_ACCESS_KEY_ID,
KinesisProperties.AWS_SECRET_KEY)

var kinesisClient: AmazonKinesisClient = new
AmazonKinesisClient(credentials)


kinesisClient.setEndpoint(KinesisProperties.KINESIS_ENDPOINT_URL,
KinesisProperties.KINESIS_SERVICE_NAME,
KinesisProperties.KINESIS_REGION_ID)
System.setProperty("aws.accessKeyId", 
KinesisProperties.AWS_ACCESS_KEY_ID)
System.setProperty("aws.secretKey", 
KinesisProperties.AWS_SECRET_KEY)
System.setProperty("AWS_ACCESS_KEY_ID",
KinesisProperties.AWS_ACCESS_KEY_ID)
System.setProperty("AWS_SECRET_KEY", 
KinesisProperties.AWS_SECRET_KEY)
val numShards =
kinesisClient.describeStream(KinesisProperties.MY_STREAM_NAME)
.getStreamDescription().getShards().size()
val numStreams = numShards
val ssc = StreamingHelper.getStreamingInstance(new
Duration(KinesisProperties.KINESIS_CHECKPOINT_INTERVAL))
ssc.addStreamingListener(new MyStreamListener)
val kinesisStreams = (0 until numStreams).map { i =>
KinesisUtils.createStream(ssc, 
KinesisProperties.MY_STREAM_NAME,
KinesisProperties.KINESIS_ENDPOINT_URL,
new 
Duration(KinesisProperties.KINESIS_CHECKPOINT_INTERVAL),
InitialPositionInStream.TRIM_HORIZON,
StorageLevel.MEMORY_AND_DISK_2)
}
/* Union all the streams */
val unionStreams = ssc.union(kinesisStreams)
val tmp_stream = unionStreams.map(byteArray => new 
String(byteArray))
val 
data=tmp_stream.window(Seconds(KinesisProperties.WINDOW_INTERVAL ),
Seconds(KinesisProperties.SLIDING_INTERVAL))
data.foreachRDD((rdd: RDD[String], time: Time) => {
if (rdd.take(1).size == 1) {
rdd.saveAsTextFile(KinesisProperties.Sink + 
time.milliseconds)
}
})
ssc.start()
ssc.awaitTermination()



Any suggestion?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/com-amazonaws-AmazonClientException-Unable-to-load-AWS-credentials-from-any-provider-in-the-chain-tp21255.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

kinesis multiple records adding into stream

2015-01-16 Thread Hafiz Mujadid

Hi Experts!

I am using kinesis dependency as follow
groupId = org.apache.spark
 artifactId = spark-streaming-kinesis-asl_2.10
 version = 1.2.0

in this aws sdk version 1.8.3 is being used. in this sdk multiple records
can not be put in a single request. is it possible to put multiple records
in a single request ? 


thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/kinesis-multiple-records-adding-into-stream-tp21191.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Inserting an element in RDD[String]

2015-01-15 Thread Hafiz Mujadid

thanks

On Thu, Jan 15, 2015 at 7:35 PM, Prannoy [via Apache Spark User List] <
ml-node+s1001560n21163...@n3.nabble.com> wrote:

> Hi,
>
> You can take the schema line in another rdd and than do a union of the two
> rdd .
>
> List schemaList = new ArrayList;
> schemaList.add("xyz");
>
> // where xyz is your schema line
>
> JavaRDD schemaRDD = sc.parallize(schemaList) ;
>
> //where sc is your sparkcontext
>
>  JavaRDD newRDD = schemaRDD.union(yourRDD);
>
> // where yourRDD is your another rdd starting of which you want to add the
> schema line.
>
> 
>
> Thanks.
>
>
>
>
>
> On Thu, Jan 15, 2015 at 7:46 PM, Hafiz Mujadid [via Apache Spark User
> List] <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=21163&i=0>> wrote:
>
>> hi experts!
>>
>> I hav an RDD[String] and i want to add schema line at beginning in this
>> rdd. I know RDD is immutable. So is there anyway to have a new rdd with one
>> schema line and contents of previous rdd?
>>
>>
>> Thanks
>>
>> --
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Inserting-an-element-in-RDD-String-tp21161.html
>>  To start a new topic under Apache Spark User List, email [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=21163&i=1>
>> To unsubscribe from Apache Spark User List, click here.
>> NAML
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Inserting-an-element-in-RDD-String-tp21161p21163.html
>  To unsubscribe from Inserting an element in RDD[String], click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=21161&code=aGFmaXptdWphZGlkMDBAZ21haWwuY29tfDIxMTYxfC05MjEzOTMxMTE=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Regards: HAFIZ MUJADID




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Inserting-an-element-in-RDD-String-tp21161p21165.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Inserting an element in RDD[String]

2015-01-15 Thread Hafiz Mujadid

hi experts!

I hav an RDD[String] and i want to add schema line at beginning in this rdd.
I know RDD is immutable. So is there anyway to have a new rdd with one
schema line and contents of previous rdd?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Inserting-an-element-in-RDD-String-tp21161.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

kinesis creating stream scala code exception

2015-01-15 Thread Hafiz Mujadid

Hi, Expert I want to consumes data from kinesis stream using spark streaming.
I am trying to  create kinesis stream using scala code. Here is my code

def main(args: Array[String]) {
println("Stream creation started")
if(create(2))
println("Stream is created successfully")
}
def create(shardCount: Int): Boolean = {
val credentials = new
BasicAWSCredentials(KinesisProperties.AWS_ACCESS_KEY_ID,
KinesisProperties.AWS_SECRET_KEY)

var kinesisClient: AmazonKinesisClient = new
AmazonKinesisClient(credentials)
kinesisClient.setEndpoint(KinesisProperties.KINESIS_ENDPOINT_URL,
KinesisProperties.KINESIS_SERVICE_NAME,
KinesisProperties.KINESIS_REGION_ID)
val createStreamRequest = new CreateStreamRequest()
createStreamRequest.setStreamName(KinesisProperties.myStreamName);
createStreamRequest.setShardCount(shardCount)
val describeStreamRequest = new DescribeStreamRequest()
describeStreamRequest.setStreamName(KinesisProperties.myStreamName)
try {
Thread.sleep(12)
} catch {
case e: Exception =>
}
var streamStatus = "not active"
while (!streamStatus.equalsIgnoreCase("ACTIVE")) {
try {
Thread.sleep(1000)
} catch {
case e: Exception => e.printStackTrace()
}
try {
val describeStreamResponse =
kinesisClient.describeStream(describeStreamRequest)
streamStatus =
describeStreamResponse.getStreamDescription.getStreamStatus
} catch {
case e: Exception => e.printStackTrace()
}
}
if (streamStatus.equalsIgnoreCase("ACTIVE"))
true
else
false
}


When I run this code I get following exception

Exception in thread "main" java.lang.NoSuchMethodError:
org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter;
at com.amazonaws.auth.AWS4Signer.(AWS4Signer.java:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at
com.amazonaws.auth.SignerFactory.createSigner(SignerFactory.java:119)
at
com.amazonaws.auth.SignerFactory.lookupAndCreateSigner(SignerFactory.java:105)
at com.amazonaws.auth.SignerFactory.getSigner(SignerFactory.java:78)
at
com.amazonaws.AmazonWebServiceClient.computeSignerByServiceRegion(AmazonWebServiceClient.java:307)
at
com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:280)
at
com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.setEndpoint(AmazonKinesisClient.java:2102)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:216)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.(AmazonKinesisClient.java:139)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.(AmazonKinesisClient.java:116)
at
com.platalytics.platform.connectors.Kinesis.App$.create(App.scala:32)
at
com.platalytics.platform.connectors.Kinesis.App$.main(App.scala:26)
at com.platalytics.platform.connectors.Kinesis.App.main(App.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)



I have following maven dependency

org.apache.spark
spark-streaming-kinesis-asl_2.10
1.2.0
 


Any suggestion?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/kinesis-creating-stream-scala-code-exception-tp21154.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: creating a single kafka producer object for all partitions

2015-01-12 Thread Hafiz Mujadid

Actually this code is producing error leader not found exception. I am
unable to find the reason

On Mon, Jan 12, 2015 at 4:03 PM, kevinkim [via Apache Spark User List] <
ml-node+s1001560n21098...@n3.nabble.com> wrote:

> Well, you can use coalesce() to decrease number of partition to 1.
> (It will take time and quite not efficient, tough)
>
> Regards,
> Kevin.
>
> On Mon Jan 12 2015 at 7:57:39 PM Hafiz Mujadid [via Apache Spark User
> List] <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=21098&i=0>> wrote:
>
>> Hi experts!
>>
>>
>> I have a schemaRDD of messages to be pushed in kafka. So I am using
>> following piece of code to do that
>>
>> rdd.foreachPartition(itr => {
>> val props = new Properties()
>> props.put("metadata.broker.list",
>> brokersList)
>> props.put("serializer.class",
>> "kafka.serializer.StringEncoder")
>> props.put("compression.codec",
>> codec.toString)
>> props.put("producer.type", "sync")
>> props.put("batch.num.messages",
>> BatchSize.toString)
>> props.put("message.send.max.retries",
>> maxRetries.toString)
>> props.put("request.required.acks", "-1")
>> producer = new Producer[String,
>> String](new ProducerConfig(props))
>> itr.foreach(row => {
>> val msg =
>> row.toString.drop(1).dropRight(1)
>> this.synchronized {
>> producer.send(new
>> KeyedMessage[String, String](Topic, msg))
>> }
>> })
>> producer.close
>> })
>>
>>
>>
>> the problem with this code is that it creates kafka producer separate for
>> each partition and I want a single producer for all partitions. Is there
>> any way to achieve this?
>>
>>
>> --
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-single-kafka-producer-object-for-all-partitions-tp21097.html
>>  To unsubscribe from Apache Spark User List, click here.
>> NAML
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-single-kafka-producer-object-for-all-partitions-tp21097p21098.html
>  To unsubscribe from creating a single kafka producer object for all
> partitions, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=21097&code=aGFmaXptdWphZGlkMDBAZ21haWwuY29tfDIxMDk3fC05MjEzOTMxMTE=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Regards: HAFIZ MUJADID




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-single-kafka-producer-object-for-all-partitions-tp21097p21099.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

creating a single kafka producer object for all partitions

2015-01-12 Thread Hafiz Mujadid

Hi experts!


I have a schemaRDD of messages to be pushed in kafka. So I am using
following piece of code to do that

rdd.foreachPartition(itr => {
val props = new Properties()
props.put("metadata.broker.list", brokersList)
props.put("serializer.class", 
"kafka.serializer.StringEncoder")
props.put("compression.codec", codec.toString)
props.put("producer.type", "sync")
props.put("batch.num.messages", 
BatchSize.toString)
props.put("message.send.max.retries", 
maxRetries.toString)
props.put("request.required.acks", "-1")
producer = new Producer[String, String](new 
ProducerConfig(props))
itr.foreach(row => {
val msg = 
row.toString.drop(1).dropRight(1)
this.synchronized {
producer.send(new 
KeyedMessage[String, String](Topic, msg))
}
})
producer.close
})



the problem with this code is that it creates kafka producer separate for
each partition and I want a single producer for all partitions. Is there any
way to achieve this?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-single-kafka-producer-object-for-all-partitions-tp21097.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

skipping header from each file

2015-01-08 Thread Hafiz Mujadid

Suppose I give three files paths to spark context to read and each file has
schema in first row. how can we skip schema lines from headers


val rdd=sc.textFile("file1,file2,file3");



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/skipping-header-from-each-file-tp21051.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

max receiving rate in spark streaming

2015-01-07 Thread Hafiz Mujadid

Hi experts!


Is there any way to decide what can be effective receiving rate for kafka
spark streaming?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/max-receiving-rate-in-spark-streaming-tp21013.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

stopping streaming context

2015-01-05 Thread Hafiz Mujadid

Hi experts!

Please is there anyway to stop spark streaming context when 5 batches are
completed ?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/stopping-streaming-context-tp20970.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

getting number of partition per topic in kafka

2015-01-03 Thread Hafiz Mujadid

Hi experts!

I am currently working on spark streaming with kafka. I have couple of
questions related to this task.

1) Is there a way to find number of partitions given a topic name?
2)Is there a way to detect whether kafka server is running or not ?


Thanks 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/getting-number-of-partition-per-topic-in-kafka-tp20952.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

2014-12-31 Thread Hafiz Mujadid

I am accessing hdfs with spark .textFile method. and I receive error as 

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: Server IPC
version 9 cannot communicate with client version 4


here are my dependencies 
 


Any suggestion ? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Exception-in-thread-main-org-apache-hadoop-ipc-RemoteException-Server-IPC-version-9-cannot-communica4-tp20925.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

SchemaRDD to RDD[String]

2014-12-23 Thread Hafiz Mujadid

Hi dears!

I want to convert a schemaRDD into RDD of String. How can we do that?

Currently I am doing like this which is not converting correctly no
exception but resultant strings are empty


here is my code
def SchemaRDDToRDD( schemaRDD : SchemaRDD ) : RDD[ String ] = {
var types = schemaRDD.schema.fields.map( field => 
field.dataType )
var list = new ArrayList[ String ]()
types.foreach( Type => {
list.add( Type.toString )
} )
schemaRDD.map( row => rowToString( row, list ) )

}

private def rowToString( row : Row, list : ArrayList[ String ] ) : 
String =
{
var record = new StringBuilder
for ( i <- 0 until row.length ) {
record.append( getValue( list.get( i ), row, i ) )
record.append( "," )
}
var sub=record.setLength( record.length - 1 ).toString
println(sub)
sub
}
/** get a single value from row object
 *  @param datatype of value
 *  @param record of type Row
 *  @param i index of column in row object
 *  @return value in string
 */
private def getValue( dataType : String, record : Row, i : Int ) : 
String =
{
var res=""
if ( dataType.equalsIgnoreCase( "IntegerType" ) )
res+=record.getInt( i )
else if ( dataType.equalsIgnoreCase( "FloatType" ) )
res+=record.getFloat( i )
else if ( dataType.equalsIgnoreCase( "LongType" ) )
res+=record.getLong( i )
else if ( dataType.equalsIgnoreCase( "DoubleType" ) )
res+=record.getDouble( i )
else if ( dataType.equalsIgnoreCase( "TimestampType" ) )
res+=record.getString( i )
else if ( dataType.equalsIgnoreCase( "ShortType" ) )
res+=record.getShort( i )
else if ( dataType.equalsIgnoreCase( "ByteType" ) )
res+=record.getByte( i )
else if ( dataType.equalsIgnoreCase( "StringType" ) )
res=record.getString( i )
else
res=record.getString( i )
println(res)
res
}







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-to-RDD-String-tp20846.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid

yep Michael Quinlan,it's working as suggested by Hoe Ren

thansk to you and Hoe Ren 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20840.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid

that's nice if it works 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20837.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid

hi dears!

Is there some efficient way to drop first line of an RDD[String]?

any suggestion?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: reading files recursively using spark

2014-12-19 Thread Hafiz Mujadid

thanks bethesda!

But if we have structure like this

a/b/a.txt
a/c/c.txt
a/d/e/e.txt

then how can we handle this case?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/reading-files-recursively-using-spark-tp20782p20785.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

reading files recursively using spark

2014-12-19 Thread Hafiz Mujadid

Hi experts!

what is efficient way to read all files using spark from directory and its
sub-directories as well.currently i move all files from directory and it
sub-directories into another temporary directory and then read them all
using sc.textFile method. But I want a method so that moving to temporary
directory cost may be saved.

Thanks 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/reading-files-recursively-using-spark-tp20782.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Saving Data only if Dstream is not empty

2014-12-08 Thread Hafiz Mujadid

Hi Experts!

I want to save DStream to HDFS only if it is not empty such that it contains
some kafka messages to be stored. What is an efficient way to do this.

   var data = KafkaUtils.createStream[Array[Byte], Array[Byte],
DefaultDecoder, DefaultDecoder](ssc, params, topicMap,
StorageLevel.MEMORY_ONLY).map(_._2)


val streams = data.window(Seconds(interval*4),
Seconds(interval*2)).map(x => new String(x))
//streams.foreachRDD(rdd=>rdd.foreach(println))

//what condition can be applied here to store only non empty DStream
streams.saveAsTextFiles(sink, "msg")
Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Data-only-if-Dstream-is-not-empty-tp20587.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

spark Exception while performing saveAsTextFiles

2014-12-07 Thread Hafiz Mujadid

I am facing following exception while saving Dstream to hdfs 


14/12/08 12:14:26 INFO DAGScheduler: Failed to run saveAsTextFile at
DStream.scala:788
14/12/08 12:14:26 ERROR JobScheduler: Error running job streaming job
1418022865000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 10
in stage 4.0 failed 4 times, most recent failure: Lost task 10.3 in stage
4.0 (TID 117, server2): java.lang.Exception: Could not compute split, block
input-0-1418022857000 not found
org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
   
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)
   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)



The code was running good 2 days before but Now I am facing this error. What
can be the reason




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-Exception-while-performing-saveAsTextFiles-tp20573.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Example usage of StreamingListener

2014-12-04 Thread Hafiz Mujadid

Thanks Akhil  
You are so helping Dear. 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Example-usage-of-StreamingListener-tp20357p20362.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: running Spark Streaming just once and stop it

2014-12-04 Thread Hafiz Mujadid

Hi Kal El!

Have you done stopping streaming after first iteration? if yes can you share
example code.


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/running-Spark-Streaming-just-once-and-stop-it-tp1382p20359.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Example usage of StreamingListener

2014-12-04 Thread Hafiz Mujadid

Hi!

does anybody has some useful example of StreamingListener interface. When
and how can we use this interface to stop streaming when one batch of data
is processed?

Thanks alot



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Example-usage-of-StreamingListener-tp20357.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming empty RDD issue

2014-12-03 Thread Hafiz Mujadid

Hi Experts
I am using Spark Streaming to integrate Kafka for real time data processing.
I am facing some issues related to Spark Streaming
So I want to know how can we detect
1) Our connection has been lost
2) Our receiver is down
3) Spark Streaming has no new messages to consume.

how can we deal these issues?

I will be glad to hear from you and will be thankful to you.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-empty-RDD-issue-tp20329.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: converting DStream[String] into RDD[String] in spark streaming

2014-12-03 Thread Hafiz Mujadid

Thanks Dear, It is good to save this data to HDFS and then load back into an
RDD :)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/converting-DStream-String-into-RDD-String-in-spark-streaming-tp20253p20258.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

converting DStream[String] into RDD[String] in spark streaming

2014-12-03 Thread Hafiz Mujadid

Hi everyOne!

I want to convert a  DStream[String] into an RDD[String]. I could not find
how to do this.  

var data = KafkaUtils.createStream[Array[Byte], Array[Byte], DefaultDecoder,
DefaultDecoder](ssc, consumerConfig, topicMap,
StorageLevel.MEMORY_ONLY).map(_._2)
val streams = data.window(Seconds(interval), Seconds(interval)).map(x =>
new String(x))

Now I want to convert this streams into a single RDD[String].


Any help please.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/converting-DStream-String-into-RDD-String-in-spark-streaming-tp20253.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: getting firs N messages froma Kafka topic using Spark Streaming

2014-12-03 Thread Hafiz Mujadid

Hi Akhil!

Thanks for your response. Can you please suggest me how to return this
sample from a function to the caller and stopping SparkStreaming

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/getting-firs-N-messages-froma-Kafka-topic-using-Spark-Streaming-tp20227p20249.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

getting firs N messages froma Kafka topic using Spark Streaming

2014-12-03 Thread Hafiz Mujadid

Hi Experts!

Is there a way to read first N messages from kafka stream and put them in 
some collection and return to the caller for visualization purpose and close
spark streaming.

I will be glad to hear from you and will be thankful to you.

Currently I have following code that 

def getsample(params: scala.collection.immutable.Map[String, String]): Unit
= {
if (params.contains("zookeeperQourum"))
  zkQuorum = params.get("zookeeperQourum").get
if (params.contains("userGroup"))
  group = params.get("userGroup").get
if (params.contains("topics"))
  topics = params.get("topics").get
if (params.contains("numberOfThreads"))
  numThreads = params.get("numberOfThreads").get
if (params.contains("sink"))
  sink = params.get("sink").get
if (params.contains("batchInterval"))
  interval = params.get("batchInterval").get.toInt
val sparkConf = new
SparkConf().setAppName("KafkaConsumer").setMaster("spark://cloud2-server:7077")
val ssc = new StreamingContext(sparkConf, Seconds(interval))
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
var consumerConfig = scala.collection.immutable.Map.empty[String,
String]
consumerConfig += ("auto.offset.reset" -> "smallest")
consumerConfig += ("zookeeper.connect" -> zkQuorum)
consumerConfig += ("group.id" -> group)
var data = KafkaUtils.createStream[Array[Byte], Array[Byte],
DefaultDecoder, DefaultDecoder](ssc, consumerConfig, topicMap,
StorageLevel.MEMORY_ONLY).map(_._2)
val streams = data.window(Seconds(interval), Seconds(interval)).map(x =>
new String(x))
streams.foreach(rdd => rdd.foreachPartition(itr => {
  while (itr.hasNext && size >= 0) {
var msg=itr.next
println(msg)
sample.append(msg)
sample.append("\n")
size -= 1
  }
}))
ssc.start()
ssc.awaitTermination(5000)
ssc.stop(true)
  }

Where sample is a StringBuilder, when I print the contents of this string
builder after getSample method call is returned. I got nothing in it.


Any help will be appreciated  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/getting-firs-N-messages-froma-Kafka-topic-using-Spark-Streaming-tp20227.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

67 matches

Mail list logo