Redirect Spark Logs to Kafka

2016-02-01 Thread Ashish Soni
Hi All , Please let me know how we can redirect spark logging files or tell spark to log to kafka queue instead of files .. Ashish

Failed job not throwing exception

2016-02-01 Thread Nick Buroojy
Hi All, I'm seeing some odd behavior with a Spark-Elasticsearch loading job. The load seems to fail, it only loads part of the data, and there is a stacktrace on the Driver, but there is no exception propagated to the top level. Has anyone seen "swallowed" failures before? We're using spark

How to deal with same class mismatch?

2016-02-01 Thread Daniel Valdivia
Hi, I'm having a couple of issues. I'm experiencing a known issue on the spark-shell where I'm getting a type mismatch for the right class :82: error: type mismatch; found :

How to build interactive dash boards with spark?

2016-02-01 Thread Andy Davidson
Over the weekend I started playing around with Shinny. I built a very simple Shinny App using R Studio. Shinny makes it easy to build a web page that interact with R. https://aedwip.shinyapps.io/developingDataProducts_CourseProject/ http://shiny.rstudio.com/ https://www.shinyapps.io/ Two

RE: Using Java spring injection with spark

2016-02-01 Thread Sambit Tripathy (RBEI/EDS1)
1. It depends on what you want to do. Don’t worry about singleton and wiring the beans as it is pretty much taken care by the Spark Framework itself. Infact doing so, you will run into issues like serialization errors. 2. You can write your code using Scala/ Python using the spark

Getting the size of a broadcast variable

2016-02-01 Thread apu mishra . rr
How can I determine the size (in bytes) of a broadcast variable? Do I need to use the .dump method and then look at the size of the result, or is there an easier way? Using PySpark with Spark 1.6. Thanks! Apu

try to read multiple bz2 files in s3

2016-02-01 Thread Lin, Hao
When I tried to read multiple bz2 files from s3, I have the following warning messages. What is the problem here? 16/02/01 22:30:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.162.67.248): java.lang.ArrayIndexOutOfBoundsException: -1844424343 at

Re: SPARK_WORKER_INSTANCES deprecated

2016-02-01 Thread Ted Yu
As the message (from SparkConf.scala) showed, you shouldn't use SPARK_WORKER_INSTANCES any more. FYI On Mon, Feb 1, 2016 at 2:19 PM, Lin, Hao wrote: > Can I still use SPARK_WORKER_INSTANCES in conf/spark-env.sh? the > following is what I’ve got after trying to set this

SPARK_WORKER_INSTANCES deprecated

2016-02-01 Thread Lin, Hao
Can I still use SPARK_WORKER_INSTANCES in conf/spark-env.sh? the following is what I’ve got after trying to set this parameter and run spark-shell SPARK_WORKER_INSTANCES was detected (set to '32'). This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with --num-executors to

Master failover and active jobs

2016-02-01 Thread aant00
Hi - I'm running Spark 1.5.2 in standalone mode with multiple masters using zookeeper for failover. The master fails over correctly to the standby when it goes down, and running applications continue to run, but in the new active master web UI, they are marked as "WAITING", and the workers have

Re: Need help in spark-Scala program

2016-02-01 Thread Vinti Maheshwari
Hi, Sorry, please ignore my message, It was sent by mistake. I am still drafting. Regards, Vinti On Mon, Feb 1, 2016 at 2:25 PM, Vinti Maheshwari wrote: > Hi All, > > I recently started learning Spark. I need to use spark-streaming. > > 1) Input, need to read from

RE: SPARK_WORKER_INSTANCES deprecated

2016-02-01 Thread Lin, Hao
If you look at the Spark Doc, variable SPARK_WORKER_INSTANCES can still be specified but yet the SPARK_EXECUTOR_INSTANCES http://spark.apache.org/docs/1.5.2/spark-standalone.html From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Monday, February 01, 2016 5:45 PM To: Lin, Hao Cc: user Subject:

Re: local class incompatible: stream classdesc serialVersionUID

2016-02-01 Thread Holden Karau
So I'm a little confused to exactly how this might have happened - but one quick guess is that maybe you've built an assembly jar with Spark core, can you mark it is a provided and or post your build file? On Fri, Jan 29, 2016 at 7:35 AM, Ted Yu wrote: > I logged

Re: local class incompatible: stream classdesc serialVersionUID

2016-02-01 Thread Holden Karau
ah yah that would not work. On Mon, Feb 1, 2016 at 2:31 PM, Shixiong(Ryan) Zhu wrote: > I guess he used client model and the local Spark version is 1.5.2 but the > standalone Spark version is 1.5.1. In other words, he used a 1.5.2 driver > to talk with 1.5.1 executors.

Re: SPARK_WORKER_INSTANCES deprecated

2016-02-01 Thread Ted Yu
I see. A pull request can be submitted for spark-standalone.md On Mon, Feb 1, 2016 at 2:51 PM, Lin, Hao wrote: > If you look at the Spark Doc, variable SPARK_WORKER_INSTANCES can still > be specified but yet the SPARK_EXECUTOR_INSTANCES > > > >

Using accumulator to push custom logs to driver

2016-02-01 Thread Utkarsh Sengar
I am trying to debug code executed in executors by logging. Even when I add log4j's LOG.info(..) inside .map() I don't see it in mesos task logs in the corresponding slaves. Its anyway inefficient to keep checking multiple slaves for logs. One way to deal with this is to push logs to a central

Spark Streaming application designing question

2016-02-01 Thread Vinti Maheshwari
Hi, I am new in spark. I wanted to do spark streaming setup to retrieve key value pairs of below format files: file: info1 Note: Each info file will have around of 1000 of these records. And our system continuously generating info files. So Through spark streaming i wanted to aggregate result.

Need help in spark-Scala program

2016-02-01 Thread Vinti Maheshwari
Hi All, I recently started learning Spark. I need to use spark-streaming. 1) Input, need to read from MongoDB db.event_gcovs.find({executions:"56791a746e928d7b176d03c0", valid:1, infofile:{$exists:1}, geo:"sunnyvale"}, {infofile:1}).count() > Number of Info files: 24441 /* 0 */ { "_id" :

Re: local class incompatible: stream classdesc serialVersionUID

2016-02-01 Thread Shixiong(Ryan) Zhu
I guess he used client model and the local Spark version is 1.5.2 but the standalone Spark version is 1.5.1. In other words, he used a 1.5.2 driver to talk with 1.5.1 executors. On Mon, Feb 1, 2016 at 2:08 PM, Holden Karau wrote: > So I'm a little confused to exactly how

Re: Redirect Spark Logs to Kafka

2016-02-01 Thread Burak Yavuz
You can use the KafkaLog4jAppender ( https://github.com/apache/kafka/blob/trunk/log4j-appender/src/main/java/org/apache/kafka/log4jappender/KafkaLog4jAppender.java ). Best, Burak On Mon, Feb 1, 2016 at 12:20 PM, Ashish Soni wrote: > Hi All , > > Please let me know how we

Re: Using accumulator to push custom logs to driver

2016-02-01 Thread Holden Karau
I wouldn't use accumulators for things which could get large, they can become kind of a bottle neck. Do you have a lot of string messages you want to bring back or only a few? On Mon, Feb 1, 2016 at 3:24 PM, Utkarsh Sengar wrote: > I am trying to debug code executed in

Re: try to read multiple bz2 files in s3

2016-02-01 Thread Robert Collich
Hi Hao, Could you please post the corresponding code? Are you using textFile or sc.parallelize? On Mon, Feb 1, 2016 at 2:36 PM Lin, Hao wrote: > When I tried to read multiple bz2 files from s3, I have the following > warning messages. What is the problem here? > > > >

Spark Standalone cluster job to connect Hbase is Stuck

2016-02-01 Thread sudhir patil
Spark job on Standalone cluster is Stuck, shows no logs after "util.AkkaUtils: Connecting to HeartbeatReceiver" on worker nodes and "storage.BlockmanagerInfo: Added broadcast..." on client driver side. Would be great, if you could clarify any of these ( or better all of these :) 1. Did anyone see

Re: Guidelines for writing SPARK packages

2016-02-01 Thread Burak Yavuz
Thanks for the reply David, just wanted to fix one part of your response: > If you > want to register a release for your package you will also need to push > the artifacts for your package to Maven central. > It is NOT necessary to push to Maven Central in order to make a release. There are

Re: Spark Standalone cluster job to connect Hbase is Stuck

2016-02-01 Thread sudhir patil
Thanks Ted for quick reply. I am using spark 1.2, exporting Hbase conf directory containing hbase-site.xml in HADOOP_CLASSPATH & SPARK_CLASSPATH. Do i need to do anything else? Issues in connecting to kerberos Hbase through spark yarn cluster is fixed spark 1.4+, so i am trying if it works in

Re: Using accumulator to push custom logs to driver

2016-02-01 Thread Holden Karau
Ah if its manual ad-hoc logging of the 100 to 200 lines then thats probably OK. On Mon, Feb 1, 2016 at 3:48 PM, Utkarsh Sengar wrote: > Not alot of string messages, I need it mostly for debugging purposed which > I will use on an ahdoc basis - manually add debug

questions about progress bar status [stuck]?

2016-02-01 Thread charles li
code: --- total = int(1e8) local_collection = range(1, total) rdd = sc.parallelize(local_collection) res = rdd.collect() --- web ui status --- ​ problems: --- 1. from the status bar, it seems that the there should be about half tasks done, but it just say there is

RE: saveAsTextFile is not writing to local fs

2016-02-01 Thread Mohammed Guller
If the data is not too big, one option is to call the collect method and then save the result to a local file using standard Java/Scala API. However, keep in mind that this will transfer data from all the worker nodes to the driver program. Looks like that is what you want to do anyway, but you

unsubscribe email

2016-02-01 Thread Eduardo Costa Alfaia
Hi Guys, How could I unsubscribe the email e.costaalf...@studenti.unibs.it, that is an alias from my email e.costaalf...@unibs.it and it is registered in the mail list . Thanks Eduardo Costa Alfaia PhD Student Telecommunication Engineering Università degli Studi di Brescia-UNIBS --

RE: saveAsTextFile is not writing to local fs

2016-02-01 Thread Mohammed Guller
You should not be saving an RDD to local FS if Spark is running on a real cluster. Essentially, each Spark worker will save the partitions that it processes locally. Check the directories on the worker nodes and you should find pieces of your file on each node. Mohammed Author: Big Data

how to covert millisecond time to SQL timeStamp

2016-02-01 Thread Andy Davidson
What little I know about working with timestamps is based on https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-da tetimestring-handling-time-intervals-and-udafs.html Using the example of dates formatted into human friend strings -> timeStamps I was able to figure out how

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-02-01 Thread Jia Zou
Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS instance with 30GB physical memory. Spark will cache data off-heap to Tachyon, the input data is also stored in Tachyon. Tachyon is configured to use 15GB memory, and use tired store. Tachyon underFS is /tmp. The only

Re: saveAsTextFile is not writing to local fs

2016-02-01 Thread Siva
Hi Mohamed, Thanks for your response. Data is available in worker nodes. But looking for something to write directly to local fs. Seems like it is not an option. Thanks, Sivakumar Bhavanari. On Mon, Feb 1, 2016 at 5:45 PM, Mohammed Guller wrote: > You should not be

Re: How to control the number of files for dynamic partition in Spark SQL?

2016-02-01 Thread Benyi Wang
Thanks Deenar, both two methods work. I actually tried the second method in spark-shell, but it didn't work at that time. The reason might be: I registered the data frame eventwk as a temporary table, repartition, then register the table again. Unfortunately I could not reproduce it. Thanks

Re: Spark Standalone cluster job to connect Hbase is Stuck

2016-02-01 Thread Ted Yu
Is the hbase-site.xml on the classpath of the worker nodes ? Which Spark release are you using ? Cheers On Mon, Feb 1, 2016 at 4:25 PM, sudhir patil wrote: > Spark job on Standalone cluster is Stuck, shows no logs after > "util.AkkaUtils: Connecting to

Re: Spark Standalone cluster job to connect Hbase is Stuck

2016-02-01 Thread Ted Yu
>From your first email, it seems that you don't observer output from hbase client. Spark 1.2 was quite old, missing fixes for log4j such as SPARK-9826 Can you change the following line in HBase's conf/log4j.properties from: log4j.logger.org.apache.hadoop.hbase=INFO to:

Error w/ Invertable ReduceByKeyAndWindow

2016-02-01 Thread Bryan Jeffrey
Hello. I have a reduceByKeyAndWindow function with an invertable function and filter function defined. I am seeing an error as follows: "Neither previous window has value for key, nor new values found. Are you sure your key classhashes consistently?" We're using case classes, and so I am sure

Re: Error w/ Invertable ReduceByKeyAndWindow

2016-02-01 Thread Bryan Jeffrey
Excuse me - I should have mentioned: I am running Spark 1.4.1, Scala 2.11. I am running in streaming mode receiving data from Kafka. Regards, Bryan Jeffrey On Mon, Feb 1, 2016 at 9:19 PM, Bryan Jeffrey wrote: > Hello. > > I have a reduceByKeyAndWindow function with an

Re: Unpersist RDD in Graphx

2016-02-01 Thread Takeshi Yamamuro
Hi, Please call "Graph#unpersist" that releases two RDDs, vertex and edge ones. "Graph#unpersist" just invokes "Graph#unpersistVertices" and "Graph#edges#unpersist"; "Graph#unpersistVertices" releases memory for vertices and "Graph#edges#unpersist" does memory for edges. If blocking = true,

Using Java spring injection with spark

2016-02-01 Thread HARSH TAKKAR
> > Hi > > I am new to apache spark and big data analytics, before starting to code > on spark data frames and rdd, i just wanted to confirm following > > 1. Can we create an implementation of java.api.Function as a singleton > bean using the spring frameworks and, can it be injected using

Re: mapWithState: remove key

2016-02-01 Thread Udo Fholl
That makes sense. Thanks for your quick response. On Fri, Jan 29, 2016 at 7:01 PM, Shixiong(Ryan) Zhu wrote: > 1. To remove a state, you need to call "state.remove()". If you return a > None in the function, it just means don't output it as the DStream's > output, but

Re: Repartition taking place for all previous windows even after checkpointing

2016-02-01 Thread Abhishek Anand
Any insights on this ? On Fri, Jan 29, 2016 at 1:08 PM, Abhishek Anand wrote: > Hi All, > > Can someone help me with the following doubts regarding checkpointing : > > My code flow is something like follows -> > > 1) create direct stream from kafka > 2) repartition

When char will be availble in Spark

2016-02-01 Thread Dr Mich Talebzadeh
Hi, I am using spark on Hive. Some tables have CHAR type characters. It is my understanding that spark converts varchar characters to String internally however the Spark version 1.5.2 that I have throws error when the underlying Hive table has CHAR fields. I wanted to when Varchar will be

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Hi, That's another thing: that the Record case class should be outside. I ran it as spark-submit. Thanks, Alex. On Mon, Feb 1, 2016 at 6:41 PM, Ted Yu wrote: > Running your sample in spark-shell built in master branch, I got: > > scala> val dataset =

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Ted Yu
Got around the previous error by adding: scala> implicit val kryoEncoder = Encoders.kryo[RecordExample] kryoEncoder: org.apache.spark.sql.Encoder[RecordExample] = class[value[0]: binary] On Mon, Feb 1, 2016 at 9:55 AM, Alexandr Dzhagriev wrote: > Hi, > > That's another thing:

Re: Using Java spring injection with spark

2016-02-01 Thread HARSH TAKKAR
Hi Please can anyone reply on this. On Mon, 1 Feb 2016, 4:28 p.m. HARSH TAKKAR wrote: > Hi >> >> I am new to apache spark and big data analytics, before starting to code >> on spark data frames and rdd, i just wanted to confirm following >> >> 1. Can we create an

Re: Guidelines for writing SPARK packages

2016-02-01 Thread David Russell
Hi Praveen, The basic requirements for releasing a Spark package on spark-packages.org are as follows: 1. The package content must be hosted by GitHub in a public repo under the owner's account. 2. The repo name must match the package name. 3. The master branch of the repo must contain

Can't view executor logs in web UI on Windows

2016-02-01 Thread Mark Pavey
I am running Spark on Windows. When I try to view the Executor logs in the UI I get the following error: HTTP ERROR 500 Problem accessing /logPage/. Reason: Server Error Caused by: java.net.URISyntaxException: Illegal character in path at index 1: .\work/app-20160129154716-0038/2/

Re: [MLlib] What is the best way to forecast the next month page visit?

2016-02-01 Thread Jorge Machado
Hi Guru, So First transform your Name pages with OneHotEncoder ( https://spark.apache.org/docs/latest/ml-features.html#onehotencoder ) then make the same thing for months: You will end with something like: (first

AFTSurvivalRegression Prediction and QuantilProbabilities

2016-02-01 Thread Christine Jula
Hello, I would like to fit a survial model with AFTSurvival Regression. My question here is what kind of prediction do I get with this? In the package survreg in R I can set a type of prediction ("response", "link", "lp", "linear", "terms", "quantile", "uquantile"). Besides, what can I

Spark MLLlib Ideal way to convert categorical features into LabeledPoint RDD?

2016-02-01 Thread unk1102
Hi I have dataset which is completely categorical and it does not contain even one column as numerical. Now I want to apply classification using Naive Bayes I have to predict whether given alert is actionable or not using YES/NO I have the following example of my dataset

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Ted Yu
Running your sample in spark-shell built in master branch, I got: scala> val dataset = sc.parallelize(Seq(RecordExample(1, "apple"), RecordExample(2, "orange"))).toDS() org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class `RecordExample` without access to the

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Hi Ted, That doesn't help neither as one method delegates to another as far as I can see: def collect_list(columnName: String): Column = collect_list(Column(columnName)) Thanks, Alex On Mon, Feb 1, 2016 at 5:55 PM, Ted Yu wrote: > bq. agg(collect_list("b") > > Have you

Spark Executor retries infinitely

2016-02-01 Thread Prabhu Joseph
Hi All, When a Spark job (Spark-1.5.2) is submitted with a single executor and if user passes some wrong JVM arguments with spark.executor.extraJavaOptions, the first executor fails. But the job keeps on retrying, creating a new executor and failing every tim*e, *until CTRL-C is pressed*. *Do

Guidelines for writing SPARK packages

2016-02-01 Thread Praveen Devarao
Hi, Is there any guidelines or specs to write a Spark package? I would like to implement a spark package and would like to know the way it needs to be structured (implement some interfaces etc) so that it can plug into Spark for extended functionality. Could any one help me

[ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-01 Thread David Russell
Hi all, Just sharing news of the release of a newly available Spark package, SAMBA . https://github.com/onetapbeyond/lambda-spark-executor SAMBA is an Apache Spark

Re: [MLlib] What is the best way to forecast the next month page visit?

2016-02-01 Thread diplomatic Guru
Any suggestions please? On 29 January 2016 at 22:31, diplomatic Guru wrote: > Hello guys, > > I'm trying understand how I could predict the next month page views based > on the previous access pattern. > > For example, I've collected statistics on page views: > > e.g.

Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Hello, I'm trying to run the following example code: import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.sql.functions._ case class RecordExample(a: Int, b: String) object ArrayExample { def main(args: Array[String]) {

Re: Spark Executor retries infinitely

2016-02-01 Thread Prabhu Joseph
Thanks Ted. My concern is how to avoid these kind of user errors on a production cluster, it would be better if Spark handles this instead of creating an Executor for every second and fails and overloading the Spark Master. Shall i report a Spark JIRA to handle this. Thanks, Prabhu Joseph On

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Hello again, Also I've tried the following snippet with concat_ws: val dataset = sc.parallelize(Seq( RecordExample(1, "apple"), RecordExample(1, "banana"), RecordExample(2, "orange")) ).toDS().groupBy($"a").agg(concat_ws(",", $"b").as[String]) dataset.take(10).foreach(println) which

Re: Spark Caching Kafka Metadata

2016-02-01 Thread Benjamin Han
Is there another way to create topics from Spark? Is there any reason the above code snippet would still produce this error? I've dumbly inserted waits and retries for testing, but that still doesn't consistently work, even after waiting several minutes. On Fri, Jan 29, 2016 at 8:29 AM, Cody

Re: java.nio.channels.ClosedChannelException in Spark Streaming KafKa Direct

2016-02-01 Thread Cody Koeninger
That indicates a problem in network communication between the executor and the kafka broker. Have you done any network troubleshooting? On Mon, Feb 1, 2016 at 9:59 AM, SRK wrote: > Hi, > > I see the following error in Spark Streaming with Kafka Direct. I think >

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Ted Yu
bq. agg(collect_list("b") Have you tried: agg(collect_list($"b") On Mon, Feb 1, 2016 at 8:50 AM, Alexandr Dzhagriev wrote: > Hello, > > I'm trying to run the following example code: > > import org.apache.spark.sql.hive.HiveContext > import org.apache.spark.{SparkContext,

Re: Spark streaming and ThreadLocal

2016-02-01 Thread N B
Is each partition guaranteed to execute in a single thread in a worker? Thanks N B On Fri, Jan 29, 2016 at 6:53 PM, Shixiong(Ryan) Zhu wrote: > I see. Then you should use `mapPartitions` rather than using ThreadLocal. > E.g., > > dstream.mapPartitions( iter -> >

RE: Using Java spring injection with spark

2016-02-01 Thread Sambit Tripathy (RBEI/EDS1)
Hi Harsh, I still do not understand your problem completely. If this is what you are talking about http://stackoverflow.com/questions/30053449/use-spring-together-with-spark Best regards Sambit Tripathy From: HARSH TAKKAR [mailto:takkarha...@gmail.com] Sent: Monday, February 01, 2016 10:28

Spark Streaming with Kafka - batch DStreams in memory

2016-02-01 Thread p pathiyil
Hi, Are there any ways to store DStreams / RDD read from Kafka in memory to be processed at a later time ? What we need to do is to read data from Kafka, process it to be keyed by some attribute that is present in the Kafka messages, and write out the data related to each key when we have

How to calculate weighted degrees in GraphX

2016-02-01 Thread Balachandar R.A.
I am new to GraphX and exploring example flight data analysis found on online. http://www.sparktutorials.net/analyzing-flight-data:-a-gentle-introduction-to-graphx-in-spark I tried calculating inDegrees (understand how many incoming flights to an airport) but I see value which corresponds to

Re: unsubscribe email

2016-02-01 Thread Kevin Mellott
Take a look at the first section on http://spark.apache.org/community.html. You basically just need to send an email from the aliased email to user-unsubscr...@spark.apache.org. If you cannot log into that email directly, then I'd recommend using a mail client that allows for the "send-as"

Re: Getting the size of a broadcast variable

2016-02-01 Thread Takeshi Yamamuro
Hi, Currently, there is no way to check the size except for snooping INFO-logs in a driver; 16/02/02 14:51:53 INFO BlockManagerInfo: Added rdd_2_12 in memory on localhost:58536 (size: 40.0 B, free: 510.7 MB) On Tue, Feb 2, 2016 at 8:20 AM, apu mishra . rr wrote: >

Re: Using Java spring injection with spark

2016-02-01 Thread HARSH TAKKAR
Hi Sambit My app is basically a cron which checks on the db, if there is a job that is scheduled and needs to be executed, and it submits the job to spark using spark java api.This app is written with spring framework as core. Each job has set of task which needs to be executed in an order. >

Is there some open source tools which implements draggable widget and make the app runing in a form of DAG ?

2016-02-01 Thread zml张明磊
Hello , I am trying to find some tools but useless. So, as title described, Is there some open source tools which implements draggable widget and make the app running in a form of DAG like workflow ? Thanks, Minglei.

can we do column bind of 2 dataframes in spark R? similar to cbind in R?

2016-02-01 Thread Devesh Raj Singh
Hi, I want to merge 2 dataframes in sparkR columnwise similar to cbind in R. We have "unionAll" for r bind but could not find anything for cbind in sparkR -- Warm regards, Devesh.

What is the correct way to reset a Linux in Cluster?

2016-02-01 Thread pengzhang130
Hi All, I have a spark cluster running in 3 Linux. One Linux is master, two other Linux have stand-by masters. And each Linux have 2 worker instance running. I found that when the Linux that have the Spark master got reset or network down, the application will froze and sometimes can not

Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread Ted Yu
See related thread on using Joda DateTime: http://search-hadoop.com/m/q3RTtSfi342nveex1=RE+NPE+ when+using+Joda+DateTime On Mon, Feb 1, 2016 at 7:44 PM, Kevin Mellott wrote: > I've had pretty good success using Joda-Time >

Re: Explaination for info shown in UI

2016-02-01 Thread Yogesh Mahajan
The jobs depend on the number of output operations (print, foreachRDD, saveAs*Files) and the number of RDD actions in those output operations. For example: dstream1.foreachRDD { rdd => rdd.count }// ONE Spark job per batch dstream1.foreachRDD { rdd => { rdd.count ; rdd.count } } // TWO Spark

Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread VISHNU SUBRAMANIAN
HI , If you need a data frame specific solution , you can try the below df.select(from_unixtime(col("max(utcTimestamp)")/1000)) On Tue, 2 Feb 2016 at 09:44 Ted Yu wrote: > See related thread on using Joda DateTime: > http://search-hadoop.com/m/q3RTtSfi342nveex1=RE+NPE+ >

Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread Kevin Mellott
I've had pretty good success using Joda-Time for date/time manipulations within Spark applications. You may be able to use the *DateTIme* constructor below, if you are starting with milliseconds. DateTime public DateTime(long instant) Constructs an

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Good to know, thanks. On Mon, Feb 1, 2016 at 6:57 PM, Ted Yu wrote: > Got around the previous error by adding: > > scala> implicit val kryoEncoder = Encoders.kryo[RecordExample] > kryoEncoder: org.apache.spark.sql.Encoder[RecordExample] = class[value[0]: > binary] > > On

Re: Can't view executor logs in web UI on Windows

2016-02-01 Thread Ted Yu
I did a brief search but didn't find relevant JIRA either. You can create a JIRA and submit pull request for the fix. Cheers > On Feb 1, 2016, at 5:13 AM, Mark Pavey wrote: > > I am running Spark on Windows. When I try to view the Executor logs in the UI > I get

Re: Spark Executor retries infinitely

2016-02-01 Thread Ted Yu
I haven't found config knob for controlling the retry count after brief search. According to http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html , default value for -XX:ParallelGCThreads= seems to be 8. This seems to explain why you got the VM initialization error. FYI On Mon, Feb

java.nio.channels.ClosedChannelException in Spark Streaming KafKa Direct

2016-02-01 Thread SRK
Hi, I see the following error in Spark Streaming with Kafka Direct. I think that this error is related to Kafka topic. Any suggestions on how to avoid this error would be of great help. java.nio.channels.ClosedChannelException at