date:20150730

FrequentItems in spark-sql-execution-stat

2015-07-30 Thread Yucheng

Hi all, I'm reading the code in spark-sql-execution-stat-FrequentItems.scala, and I'm a little confused about the add method in the FreqItemCounter class. Please see the link here:

Re: Data source aliasing

2015-07-30 Thread Mridul Muralidharan

Would be a good idea to generalize this for spark core - and allow for its use in serde, compression, etc. Regards, Mridul On Thu, Jul 30, 2015 at 11:33 AM, Joseph Batchik josephbatc...@gmail.com wrote: Yep I was looking into using the jar service loader. I pushed a rough draft to my fork of

Parquet SaveMode.Append Trouble.

2015-07-30 Thread satyajit vegesna

Hi, I am new to using Spark and Parquet files, Below is what i am trying to do, on Spark-shell, val df = sqlContext.parquetFile(/data/LM/Parquet/Segment/pages/part-m-0.gz.parquet) Have also tried below command, val

add to user list

2015-07-30 Thread Sachin Aggarwal

-- Thanks Regards Sachin Aggarwal 7760502772

Re: add to user list

2015-07-30 Thread Ted Yu

Please take a look at the first section of: https://spark.apache.org/community On Thu, Jul 30, 2015 at 9:23 PM, Sachin Aggarwal different.sac...@gmail.com wrote: -- Thanks Regards Sachin Aggarwal 7760502772

Re: Parquet SaveMode.Append Trouble.

2015-07-30 Thread StanZhai

You should import org.apache.spark.sql.SaveMode -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Parquet-SaveMode-Append-Trouble-tp13529p13531.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: High availability with zookeeper: worker discovery

2015-07-30 Thread Ted Yu

zookeeper is not a direct dependency of Spark. Can you give a bit more detail on how the election / discovery of master works ? Cheers On Thu, Jul 30, 2015 at 7:41 PM, Christophe Schmitz cofcof...@gmail.com wrote: Hi there, I am trying to run a 3 node spark cluster where each nodes contains

Re: High availability with zookeeper: worker discovery

2015-07-30 Thread Christophe Schmitz

Hi Ted, Thanks for your reply. I think zookeeper is an optional dependency of Spark. To enable it, I essentially use this flags on all my spark-env.sh: SPARK_DAEMON_JAVA_OPTS=-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=my-zoo-ip:2181 and of course, I have my zookeeper

High availability with zookeeper: worker discovery

2015-07-30 Thread Christophe Schmitz

Hi there, I am trying to run a 3 node spark cluster where each nodes contains a spark worker and a spark maser. Election of the master happens via zookeeper. The way I am configuring it is by (on each node) giving the IP:PORT of the local master to the local worker, and I wish the worker could

Re: Spark (1.2) yarn allocator does not remove container request for allocated container, resulting in a bloated ask[] of containers and inefficient resource utilization of cluster resources.

2015-07-30 Thread prakhar jauhari

hey all, Thanks in advance. I am facing this issue in production, where due to increased container request RM is reserving memory and hampering cluster utilization. Thus the fix needs to be patched on spark 1.2. Has any one looked in the removeContainerRequest part for allocated containers in

Re: Writing streaming data to cassandra creates duplicates

2015-07-30 Thread Priya Ch

Hi All, Can someone throw insights on this ? On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi TD, Thanks for the info. I have the scenario like this. I am reading the data from kafka topic. Let's say kafka has 3 partitions for the topic. In my

Data source aliasing

2015-07-30 Thread Joseph Batchik

Hi all, There are now starting to be a lot of data source packages for Spark. A annoyance I see is that I have to type in the full class name like: sqlContext.read.format(com.databricks.spark.avro).load(path). Spark internally has formats such as parquet and jdbc registered and it would be nice

Re: Data source aliasing

2015-07-30 Thread Patrick Wendell

Yeah this could make sense - allowing data sources to register a short name. What mechanism did you have in mind? To use the jar service loader? The only issue is that there could be conflicts since many of these are third party packages. If the same name were registered twice I'm not sure what

Re: Data source aliasing

2015-07-30 Thread Michael Armbrust

+1 On Thu, Jul 30, 2015 at 11:18 AM, Patrick Wendell pwend...@gmail.com wrote: Yeah this could make sense - allowing data sources to register a short name. What mechanism did you have in mind? To use the jar service loader? The only issue is that there could be conflicts since many of these

Re: Data source aliasing

2015-07-30 Thread Joseph Batchik

Yep I was looking into using the jar service loader. I pushed a rough draft to my fork of Spark: https://github.com/JDrit/spark/commit/946186e3f17ddcc54acf2be1a34aebf246b06d2f Right now it will use the first alias it finds, but I can change that to check them all and report an error if it finds

Re: UDF Method overloading

2015-07-30 Thread Joe Halliwell

Hi Sachith, Yes, that's possible, you just need to implement https://hive.apache.org/javadocs/r0.10.0/api/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.html Note the class documentation: A Generic User-defined function (GenericUDF) for the use with Hive. New GenericUDF classes need to inherit

UDF Method overloading

2015-07-30 Thread Sachith Withana

Hi all, Does spark support UDF Method overloading? ex: I want to have an UDF with varying number of arguments multiply(a,b) multiply(a,b,c) Any suggestions? -- Thanks, Sachith Withana

Machine learning unit tests guidelines

2015-07-30 Thread Ulanov, Alexander

Dear Spark developers, Are there any best practices or guidelines for machine learning unit tests in Spark? After taking a brief look at the unit tests in ML and MLlib, I have found that each algorithm is tested in a different way. There are few kinds of tests: 1)Partial check of internal

FrequentItems in spark-sql-execution-stat

Re: Data source aliasing

Parquet SaveMode.Append Trouble.

add to user list

Re: add to user list

Re: Parquet SaveMode.Append Trouble.

Re: High availability with zookeeper: worker discovery

Re: High availability with zookeeper: worker discovery

High availability with zookeeper: worker discovery

Re: Spark (1.2) yarn allocator does not remove container request for allocated container, resulting in a bloated ask[] of containers and inefficient resource utilization of cluster resources.

Re: Writing streaming data to cassandra creates duplicates

Data source aliasing

Re: Data source aliasing

Re: Data source aliasing

Re: Data source aliasing

Re: UDF Method overloading

UDF Method overloading

Machine learning unit tests guidelines

18 matches

Site Navigation

Mail list logo

Footer information