I turned on the TRACE and I see lot of following exception:
java.lang.IllegalAccessError: com/google/protobuf/ZeroCopyLiteralByteString
at
org.apache.hadoop.hbase.protobuf.RequestConverter.buildRegionSpecifier(RequestConverter.java:897)
at
Hi,
I am working with standalone HBase. And I want to execute HBaseTest.scala
(in scala examples) .
I have created a test table with three rows and I just want to get the
count using HBaseTest.scala
I am getting this issue:
15/04/29 11:17:10 INFO BlockManagerMaster: Registered BlockManager
Or multiple volumes. The LOCAL_DIRS (YARN) and SPARK_LOCAL_DIRS (Mesos,
Standalone) environment variables and the spark.local.dir property control
where temporary data is written. The default is /tmp.
See
http://spark.apache.org/docs/latest/configuration.html#runtime-environment
for more details.
Makes sense. / is where /tmp would be. However, 230G should be plenty of
space. If you have INFO logging turned on (set in
$SPARK_HOME/conf/log4j.properties), you'll see messages about saving data
to disk that will list sizes. The web console also has some summary
information about this.
dean
Hi all,
I execute simple SQL query and got a unacceptable performance.
I do the following steps:
1. Apply a schema to an RDD and register table.
2. Run sql query which returns several entries:
Running time for this query 0.2s (table contains 10 entries). I think
that Spark SQL
Hi,
I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
output,the training data is a file containing 156060 (size 8.1M).
The problem is that when trying to presist a partition into memory and there
is not enought memory, the partition is persisted on disk and despite Having
Hi, I have a ddf with schema (CustomerID, SupplierID, ProductID, Event,
CreatedOn), the first 3 are Long ints and event can only be 1,2,3 and
CreatedOn is a timestamp. How can I make a group triplet/doublet/singlet out
of them such that I can infer that Customer registered event from 1to 2 and
if
Hi everyone,
what is the most efficient way to filter a DataFrame on a column from
another Dataframe's column. The best idea I had, was to join the two
dataframes :
val df1 : Dataframe
val df2: Dataframe
df1.join(df2, df1(id) === df2(id), inner)
But I end up (obviously) with the id column
Please use user@, not dev@
This message does not appear to be from your driver. It also doesn't say
you ran out of memory. It says you didn't tell YARN to let it use the
memory you want. Look at the memory overhead param and please search first
for related discussions.
On Apr 29, 2015 11:43 AM,
I would use the ps command on each machine while the job is running to
confirm that every process involved is running as root.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler
Do you have multiple disks? Maybe your work directory is not in the right
disk?
On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com wrote:
Hi,
I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
output,the training data is a file containing 156060 (size
This is the output of df -h so as you can see I'm using only one disk
mounted on /
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda8 276G 34G 229G 13% /none4.0K 0
4.0K 0% /sys/fs/cgroup
udev7.8G 4.0K 7.8G 1% /dev
tmpfs 1.6G
Hi Sparkers,
I am trying to run MLib kmeans on a large dataset(50+Gb of data) and a
large K but I've encountered the following issues:
- Spark driver gets out of memory and dies because collect gets called
as part of KMeans, which loads all data back to the driver's memory.
- At the
How you are passing feature vector to K means?
its in 2-D space of 1-D array?
Did you try using Streaming Kmeans?
will you be able to paste code here?
On 29 April 2015 at 17:23, Sam Stoelinga sammiest...@gmail.com wrote:
Hi Sparkers,
I am trying to run MLib kmeans on a large dataset(50+Gb
I'm mostly using example code, see here:
http://paste.openstack.org/show/211966/
The data has 799305 dimensions and is separated by space
Please note the issues I'm seeing is because of the scala implementation
imo as it happens also when using the Python wrappers.
On Wed, Apr 29, 2015 at 8:00
The memory leak could be related to this
https://issues.apache.org/jira/browse/SPARK-5967 defect that was resolved
in Spark 1.2.2 and 1.3.0.
@Sean
Will it be backported to CDH? I did't find that bug in CDH 5.4 release
notes.
2015-04-29 14:51 GMT+02:00 Conor Fennell conor.fenn...@altocloud.com:
Sorry but I didn't fully understand the grouping. This line:
The group must only take the closest previous trigger. The first one
hence shows alone.
Can you please explain further?
On Wed, Apr 29, 2015 at 4:42 PM, bipin bipin@gmail.com wrote:
Hi, I have a ddf with schema (CustomerID,
Sorry I put the log messages when creating the thread in
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-td22702.html
but I forgot that raw messages will not be sent in emails.
So this is the log related to the error :
15/04/29 02:48:50 INFO
You can use .select to project only columns you need
On Wed, Apr 29, 2015 at 9:23 PM, Olivier Girardot ssab...@gmail.com wrote:
Hi everyone,
what is the most efficient way to filter a DataFrame on a column from
another Dataframe's column. The best idea I had, was to join the two
dataframes :
Guys, great feedback by pointing out my stupidity :D
Rows and columns got intermixed hence the weird results I was seeing.
Ignore my previous issues will reformat my data first.
On Wed, Apr 29, 2015 at 8:47 PM, Sam Stoelinga sammiest...@gmail.com
wrote:
I'm mostly using example code, see here:
Hi all,
I was testing the DataFrame filter functionality and I found what I think
is a strange behaviour.
My dataframe testDF, obtained loading aMySQL table via jdbc, has the
following schema:
root
| -- id: long (nullable = false)
| -- title: string (nullable = true)
| -- value: string
The memory leak could be related to this
https://issues.apache.org/jira/browse/SPARK-5967 defect that was resolved
in Spark 1.2.2 and 1.3.0.
It also was a HashMap causing the issue.
-Conor
On Wed, Apr 29, 2015 at 12:01 PM, Sean Owen so...@cloudera.com wrote:
Please use user@, not dev@
Looks like you DF is based on a MySQL DB using jdbc, and error is thrown
from mySQL. Can you see what SQL is finally getting fired in MySQL? Spark
is pushing down the predicate to mysql so its not a spark problem perse
On Wed, Apr 29, 2015 at 9:56 PM, Francesco Bigarella
Yes, and Kafka topics are basically queues. So perhaps what's needed is just
KafkaRDD with starting offset being 0 and finish offset being a very large
number...
Sent from my iPhone
On Apr 29, 2015, at 1:52 AM, ayan guha guha.a...@gmail.com wrote:
I guess what you mean is not streaming.
Correct myself:
For the SparkContext#wholeTextFile, the RDD's elements are kv pairs, the key is
the file path, and the value is the file content
So,for the SparkContext#wholeTextFile, the RDD has already carried the file
information.
bit1...@163.com
From: Saisai Shao
Date: 2015-04-29 15:50
On 04/27/2015 06:00 PM, Ganelin, Ilya wrote:
Marco - why do you want data sorted both within and across partitions? If you
need to take an ordered sequence across all your data you need to either
aggregate your RDD on the driver and sort it, or use zipWithIndex to apply an
ordered index
Hi,
I want to check the DEBUG log of spark executor on Standalone deploy mode.
But,
1. Set log4j.properties in spark/conf folder on master node and restart
cluster.
no means above works.
2. usning spark-submit --properties-file log4j.
Just print debug log to screen but executor log still seems to
Hi,
I am trying to implement equal depth and equal height binning methods in
spark.
Any insights, existing code for this would be really helpful.
Thanks,
Kundan
Yes, looks like a solution but quite tricky. You have to parse the debug
string to get the file name, also relies on HadoopRDD to get the file name
:)
2015-04-29 14:52 GMT+08:00 Akhil Das ak...@sigmoidanalytics.com:
It is possible to access the filename, its a bit tricky though.
val fstream
It is possible to access the filename, its a bit tricky though.
val fstream = ssc.fileStream[LongWritable, IntWritable,
SequenceFileInputFormat[LongWritable,
IntWritable]](/home/akhld/input/)
fstream.foreach(x ={
//You can get it with this object.
Hi,
Good question. The extra memory comes from
spark.yarn.executor.memoryOverhead, the space used for the application
master, and the way the YARN rounds requests up. This explains it in a
little more detail:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
I guess what you mean is not streaming. If you create a stream context at
time t, you will receive data coming through starting time t++, not before
time t.
Looks like you want a queue. Let Kafka write to a queue, consume msgs from
the queue and stop when queue is empty.
On 29 Apr 2015 14:35,
Thank you for your Answer!
Yes I would like to work on it.
Selim
On Mon, Apr 27, 2015 at 5:23 AM Joseph Bradley jos...@databricks.com
wrote:
Unfortunately, the Pipelines API doesn't have multiclass logistic
regression yet, only binary. It's really a matter of modifying the current
Hi Toni.
Given there is more than one measure by (user, hour) what is the measure
you want to keep? The sum?, the mean?, the most recent measure?. For the
sum or the mean you don't need to care about the timing. And If you wan't
to have the most recent then you can include the timestamp in the
Thanks Sandy, it is very useful!
bit1...@163.com
From: Sandy Ryza
Date: 2015-04-29 15:24
To: bit1...@163.com
CC: user
Subject: Re: Question about Memory Used and VCores Used
Hi,
Good question. The extra memory comes from spark.yarn.executor.memoryOverhead,
the space used for the
The issue is solved. There was a problem in my hive codebase. Once that was
fixed, -Phive-provided spark is working fine against my hive jars.
On 27 April 2015 at 08:00, Manku Timma manku.tim...@gmail.com wrote:
Made some progress on this. Adding hive jars to the system classpath is
needed.
Thanks for the detailed information!
Now I can confirm that this is a backwards-compatibility issue. The data
written by parquet 1.6rc7 follows the standard LIST structure. However,
Spark SQL still uses old parquet-avro style two-level structures, which
causes the problem.
Cheng
On 4/27/15
Wrapping the old LogisticRegressionWithLBFGS could be a quick solution
for 1.4, and it's not too hard so it's potentially to get into 1.4. In
the long term, I will like to implement a new version like
https://github.com/apache/spark/commit/6a827d5d1ec520f129e42c3818fe7d0d870dcbef
which handles the
I have multiple DataFrame objects each stored in a parquet file. The DataFrame
just contains 3 columns (id, value, timeStamp). I need to union all the
DataFrame objects together but for duplicated id only keep the record with the
latest timestamp. How can I do that?
I can do this for RDDs
Hadoop version doesn't matter if you're just using cassandra.
On Wed, Apr 29, 2015 at 12:08 PM, Matthew Johnson matt.john...@algomi.com
wrote:
Hi all,
I am new to Spark, but excited to use it with our Cassandra cluster. I
have read in a few places that Spark can interact directly with
Not sure what you mean. It's already in CDH since 5.4 = 1.3.0
(This isn't the place to ask about CDH)
I also don't think that's the problem. The process did not run out of
memory.
On Wed, Apr 29, 2015 at 2:08 PM, Serega Sheypak serega.shey...@gmail.com
wrote:
The memory leak could be related to
Hi all,
I am new to Spark, but excited to use it with our Cassandra cluster. I have
read in a few places that Spark can interact directly with Cassandra now,
so I decided to download it and have a play – I am happy to run it in
standalone cluster mode initially. When I go to download it (
I'm running this query with different parameter on the same RDD and got 0.2s
for each query.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-Spark-SQL-supports-primary-and-secondary-indexes-tp22700p22706.html
Sent from the Apache Spark User List
Hey Roberto,
You will likely want to use a cogroup() then, but it hinges all on how your
data looks, i.e. if you have the index in the key. Here's an example:
http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#cogroup
.
Clone: RDDs are immutable, so if you need to make changes
Dear Sparkers,
I am working on an algorithm which requires the pair distance between all
points (eg. DBScan, LOF, etc.). Computing this for *n* points will require
produce a n^2 matrix. If the distance measure is symmetrical, this can be
reduced to (n^2)/2. What would be the most optimal way of
On Mon, Apr 27, 2015 at 7:36 AM, Shuai Zheng szheng.c...@gmail.com wrote:
Thanks. So may I know what is your configuration for more/smaller
executors on r3.8xlarge, how big of the memory that you eventually decide
to give one executor without impact performance (for example: 64g? ).
We're
Hi,
I have a 2 billion records dataset witch schema eventId: String, time: Double,
value: Double. It is stored in Parquet format in HDFS, size 23GB. Specs: Spark
1.3, Hadoop 1.2.1, 8 nodes with Xeon 16GB RAM, 1TB disk space, each node has 3
workers with 3GB memory.
I keep failing to sort the
Hi all,
I'm new to Spark so I'm sorry if the question is too vague. I'm currently
trying to deploy a Spark cluster using YARN on an amazon EMR cluster. For
the data storage I'm currently using S3 but would loading the data in HDFS
from local node gives considerable performance advantage over
http://planetcassandra.org/getting-started-with-apache-spark-and-cassandra/
http://planetcassandra.org/blog/holy-momentum-batman-spark-and-cassandra-circa-2015-w-datastax-connector-and-java/
https://github.com/datastax/spark-cassandra-connector
From: Cody Koeninger [mailto:c...@koeninger.org]
Thanks for the responses.
Try removing toDebugString and see what happens.
The toDebugString is performed after [d] (the action), as [e]. By then all
stages are already executed.
--
View this message in context:
I'm not sure, but I wonder if because you are using the Spark REPL that it
may not be representing what a normal runtime execution would look like and
is possibly eagerly running a partial DAG once you define an operation that
would cause a shuffle.
What happens if you setup your same set of
Hi, I am trying to see exactly what happens underneath the hood of Spar when
performing a simple sortByKey. So far I've already discovered the
fetch-files and both the temp-shuffle and shuffle files being written to
disk, but there is still an extra stage that keeps on puzzling me.
This is the
Hi all,
I have to estimate resource requirements for my hadoop/spark cluster. In
particular, i have to query about 100tb of hbase table to do aggregation
with spark sql.
What is, approximately, the most suitable cluster configuration for my use
case? In order to query data in a fast way. At last
Thanks for the suggestion. I ran the command and the limit is 1024.
Based on my understanding, the connector to Kafka should not open so many
files. Do you think there is possible socket leakage? BTW, in every batch
which is 5 seconds, I output some results to mysql:
def ingestToMysql(data:
Hi,
Following the Quick Start guide:
https://spark.apache.org/docs/latest/quick-start.html
I could compile and run a Spark program successfully, now my question is
how to
compile multiple programs with sbt in a bunch. E.g, two programs as:
./src
./src/main
./src/main/scala
Can you run the command 'ulimit -n' to see the current limit ?
To configure ulimit settings on Ubuntu, edit */etc/security/limits.conf*
Cheers
On Wed, Apr 29, 2015 at 2:07 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi all,
I am using the direct approach to receive real-time data from
Maybe add statement.close() in finally block ?
Streaming / Kafka experts may have better insight.
On Wed, Apr 29, 2015 at 2:25 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Thanks for the suggestion. I ran the command and the limit is 1024.
Based on my understanding, the connector to Kafka
Its no different, you would use group by and aggregate function to do so.
On 30 Apr 2015 02:15, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com
wrote:
I have multiple DataFrame objects each stored in a parquet file. The
DataFrame just contains 3 columns (id, value, timeStamp). I need to
Hi all,
I am using the direct approach to receive real-time data from Kafka in the
following link:
https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
My code follows the word count direct example:
I'm not sure, but I wonder if because you are using the Spark REPL that it
may not be representing what a normal runtime execution would look like and
is possibly eagerly running a partial DAG once you define an operation that
would cause a shuffle.
What happens if you setup your same set of
Is the function ingestToMysql running on the driver or on the executors?
Accordingly you can try debugging while running in a distributed manner,
with and without calling the function.
If you dont get too many open files without calling ingestToMysql(), the
problem is likely to be in
Have you taken a look at the join section in the streaming programming
guide?
http://spark.apache.org/docs/latest/streaming-programming-guide.html#stream-dataset-joins
On Wed, Apr 29, 2015 at 7:11 AM, Rendy Bambang Junior
rendy.b.jun...@gmail.com wrote:
Let say I have transaction data and
Hi,
Is it possible to register kryo serialization for classes contained in jars
that are added with spark.jars? In my experiment it doesn't seem to
work, likely because the class registration happens before the jar is
shipped to the executor and added to the classloader. Here's the general
idea
This function is called in foreachRDD. I think it should be running in the
executors. I add the statement.close() in the code and it is running. I
will let you know if this fixes the issue.
On Wed, Apr 29, 2015 at 4:06 PM, Tathagata Das t...@databricks.com wrote:
Is the function ingestToMysql
This is my first thought, please suggest any further improvement:
1. Create a rdd of your dataset
2. Do an cross join to generate pairs
3. Apply reducebykey and compute distance. You will get a rdd with keypairs
and distance
Best
Ayan
On 30 Apr 2015 06:11, Driesprong, Fokko fo...@driesprong.frl
HI, Ted,
I will have a look at it , thanks a lot.
Cheers,
Dan
2015年4月29日 下午5:00于 Ted Yu yuzhih...@gmail.com写道:
Have you looked at
http://www.scala-sbt.org/0.13/tutorial/Multi-Project.html ?
Cheers
On Wed, Apr 29, 2015 at 2:45 PM, Dan Dong dongda...@gmail.com wrote:
Hi,
I believe that the implicit def is pulling in the enclosing class (in which
the def is defined) in the closure which is not serializable.
On Wed, Apr 29, 2015 at 4:20 AM, guoqing0...@yahoo.com.hk
guoqing0...@yahoo.com.hk wrote:
Hi guys,
I`m puzzled why i cant use the implicit function in
Also cc;ing Cody.
@Cody maybe there is a reason for doing connection pooling even if there is
not performance difference.
TD
On Wed, Apr 29, 2015 at 4:06 PM, Tathagata Das t...@databricks.com wrote:
Is the function ingestToMysql running on the driver or on the executors?
Accordingly you can
It could be related to this.
https://issues.apache.org/jira/browse/SPARK-6737
This was fixed in Spark 1.3.1.
On Wed, Apr 29, 2015 at 8:38 AM, Sean Owen so...@cloudera.com wrote:
Not sure what you mean. It's already in CDH since 5.4 = 1.3.0
(This isn't the place to ask about CDH)
I also
Have you looked at http://www.scala-sbt.org/0.13/tutorial/Multi-Project.html
?
Cheers
On Wed, Apr 29, 2015 at 2:45 PM, Dan Dong dongda...@gmail.com wrote:
Hi,
Following the Quick Start guide:
https://spark.apache.org/docs/latest/quick-start.html
I could compile and run a Spark program
Part of the issues is, when you read messages in a topic, the messages are
peeked, not polled, so there'll be no when the queue is empty, as I
understand it.
So it would seem I'd want to do KafkaUtils.createRDD, which takes an array
of OffsetRange's. Each OffsetRange is characterized by topic,
Can you verify whether the hbase release you're using has the following fix
?
HBASE-8 non environment variable solution for IllegalAccessError
Cheers
On Tue, Apr 28, 2015 at 10:47 PM, Tridib Samanta tridib.sama...@live.com
wrote:
I turned on the TRACE and I see lot of following exception:
Let say I have transaction data and visit data
visit
| userId | Visit source | Timestamp |
| A | google ads | 1 |
| A | facebook ads | 2 |
transaction
| userId | total price | timestamp |
| A | 100 | 248384|
| B | 200 | 43298739 |
I
The idea of peek vs poll doesn't apply to kafka, because kafka is not a
queue.
There are two ways of doing what you want, either using KafkaRDD or a
direct stream
The Kafka rdd approach would require you to find the beginning and ending
offsets for each partition. For an example of this, see
looks like you need this:
lst = [[10001, 132, 2002, 1, 2012-11-23],
[10001, 132, 2002, 1, 2012-11-24],
[10031, 102, 223, 2, 2012-11-24],
[10001, 132, 2002, 2, 2012-11-25],
[10001, 132, 2002, 3, 2012-11-26]]
base = sc.parallelize(lst,1).map(lambda x:
I think you'd probably want to look at combineByKey. I'm on my phone so can't
give you an example, but that's one solution i would try.
You would then take the resulting RDD and go back to a DF if needed.
From: bipinmailto:bipin@gmail.com
Sent: 4/29/2015
# disclaimer I'm an employee of Elastic (the company behind Elasticsearch) and
lead of Elasticsearch Hadoop integration
Some things to clarify on the Elasticsearch side:
1. Elasticsearch is a distributed, real-time search and analytics engine. Search is just one aspect of it and it can
work
You mean after joining ? Sure, my question was more if there was any best
practice preferred to joining the other dataframe for filtering.
Regards,
Olivier.
Le mer. 29 avr. 2015 à 13:23, Olivier Girardot ssab...@gmail.com a écrit :
Hi everyone,
what is the most efficient way to filter a
Could you put the implicit def in an object? That should work, as objects
are never serialized.
On Wed, Apr 29, 2015 at 6:28 PM, guoqing0...@yahoo.com.hk
guoqing0...@yahoo.com.hk wrote:
Thank you for your pointers , it`s very helpful to me , in this scenario
how can i use the implicit def in
No, sorry this is not supported. Support for more than one database is
lacking in several areas (though mostly works for hive tables). I'd like
to fix this in Spark 1.5.
On Tue, Apr 28, 2015 at 1:54 AM, James Aley james.a...@swiftkey.com wrote:
Hey all,
I'm trying to create tables from
After day of debugging (actually, more), I can answer my question:
The problem is that the default value 200 of spark.sql.shuffle.partitions is
too small for sorting 2B rows. It was hard to realize because Spark executors
just crash with various exceptions one by one. The other takeaway is that
Cross Join shuffle space might not be needed since most likely through
application specific logic (topK etc) you can cut the shuffle space...Also
most likely the brute force approach will be a benchmark tool to see how
better is your clustering based KNN solution since there are several ways
you
Thank you for your pointers , it`s very helpful to me , in this scenario how
can i use the implicit def in the enclosing class ?
From: Tathagata Das
Date: 2015-04-30 07:00
To: guoqing0...@yahoo.com.hk
CC: user
Subject: Re: implicit function in SparkStreaming
I believe that the implicit def is
Hi all
We know that spark support Kryo serialization, suppose there is a map
function which map C to K,V(here C,K,V are instance of class C,K,V), when we
register kryo serialization, should I register all of these three class?
Best Wishes
Triones Deng
Hi,
Follow the instructions to install on the following link:
http://mbonaci.github.io/mbo-spark/
You dont need to install spark on every node.Just install it on one node
or you can install it on remote system also and made a spark cluster.
Thanks
Madhvi
On Thursday 30 April 2015 09:31 AM,
For you question, I think the discussion in this link can help.
http://apache-spark-user-list.1001560.n3.nabble.com/Error-related-to-serialisation-in-spark-streaming-td6801.html
Best regards,
Lin Hao XU
IBM Research China
Email: xulin...@cn.ibm.com
My Flickr:
I have the real DEBS-TAxi data in csv file , in order to operate over it
how to simulate a Spout kind of thing as event generator using the
timestamps in CSV file.
--
SERC-IISC
Thanks Regards,
Anshu Shukla
As I understand from SQL, group by allow you to do sum(), average(), max(),
mn(). But how do I select the entire row in the group with maximum column
timeStamp? For example
id1, value1, 2015-01-01
id1, value2, 2015-01-02
id2, value3, 2015-01-01
id2, value4, 2015-01-02
I want to return
id1,
Use lsof to see what files are actually being held open.
That stacktrace looks to me like it's from the driver, not executors.
Where in foreach is it being called? The outermost portion of foreachRDD
runs in the driver, the innermost portion runs in the executors. From the
docs:
Hi experts,
I see spark on yarn has yarn-client and yarn-cluster mode. I also have a 5
nodes hadoop cluster (hadoop 2.4). How to install spark if I want to try
the spark on yarn mode.
Do I need to install spark on the each node of hadoop cluster ?
Thanks,
Xiaohe
Appreciate for your help , it works . i`m curious why the enclosing class
cannot serialized , is it need to extends java.io.Serializable ? if object
never serialized how it works in the task .whether there`s any association with
the spark.closure.serializer .
guoqing0...@yahoo.com.hk
Thanks for the comments, Cody.
Granted, Kafka topics aren't queues. I was merely wishing that Kafka's
topics had some queue behaviors supported because often that is exactly
what one wants. The ability to poll messages off a topic seems like what
lots of use-cases would want.
I'll explore both
On 4/29/15 6:02 PM, Jeetendra Gangele wrote:
Thanks for detail explanation. My only worry is to search the all combinations
of company names through ES looks hard.
I'm not sure what makes you think ES looks hard. Have you tried browsing the Elasticsearch reference or the definitive
guide?
93 matches
Mail list logo