from:"williamtellme123"

user-unsubscr...@spark.apache.org

2017-06-07 Thread williamtellme123

user-unsubscr...@spark.apache.org

From: kundan kumar [mailto:iitr.kun...@gmail.com] 
Sent: Wednesday, June 7, 2017 5:15 AM
To: 颜发才(Yan Facai) 
Cc: spark users 
Subject: Re: Convert the feature vector to raw data

Hi Yan, 

This doesnt work.

thanks,

kundan

On Wed, Jun 7, 2017 at 2:53 PM, 颜发才(Yan Facai)  > wrote:

Hi, kumar.

How about removing the `select` in your code?

namely,

Dataset result = model.transform(testData);

result.show(1000, false);

On Wed, Jun 7, 2017 at 5:00 PM, kundan kumar  > wrote:

I am using 

Dataset result = model.transform(testData).select("probability", 
"label","features");

 result.show(1000, false);

In this case the feature vector is being printed as output. Is there a way that 
my original raw data gets printed instead of the feature vector OR is there a 
way to reverse extract my raw data from the feature vector. All of the features 
that my dataset have is categorical in nature.

Thanks,

Kundan

user-unsubscr...@spark.apache.org

2017-06-07 Thread williamtellme123

user-unsubscr...@spark.apache.org

From: 颜发才(Yan Facai) [mailto:facai@gmail.com] 
Sent: Wednesday, June 7, 2017 4:24 AM
To: kundan kumar 
Cc: spark users 
Subject: Re: Convert the feature vector to raw data

Hi, kumar.

How about removing the `select` in your code?

namely,

Dataset result = model.transform(testData);

result.show(1000, false);

On Wed, Jun 7, 2017 at 5:00 PM, kundan kumar  > wrote:

I am using 

Dataset result = model.transform(testData).select("probability", 
"label","features");

 result.show(1000, false);

In this case the feature vector is being printed as output. Is there a way that 
my original raw data gets printed instead of the feature vector OR is there a 
way to reverse extract my raw data from the feature vector. All of the features 
that my dataset have is categorical in nature.

Thanks,

Kundan

user-unsubscr...@spark.apache.org

2017-06-07 Thread williamtellme123

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

From: kundan kumar [mailto:iitr.kun...@gmail.com] 
Sent: Wednesday, June 7, 2017 4:01 AM
To: spark users 
Subject: Convert the feature vector to raw data

I am using 

Dataset result = model.transform(testData).select("probability", 
"label","features");

 result.show(1000, false);

In this case the feature vector is being printed as output. Is there a way that 
my original raw data gets printed instead of the feature vector OR is there a 
way to reverse extract my raw data from the feature vector. All of the features 
that my dataset have is categorical in nature.

Thanks,

Kundan

user-unsubscr...@spark.apache.org

2017-05-30 Thread williamtellme123

From: Joel D [mailto:games2013@gmail.com] 
Sent: Monday, May 29, 2017 9:04 PM
To: user@spark.apache.org
Subject: Schema Evolution Parquet vs Avro

Hi,

We are trying to come up with the best storage format for handling schema 
changes in ingested data.

We noticed that both avro and parquet allows one to select based on column name 
instead of the data index/position of data. However, we are inclined towards 
parquet for better read performance since it's columnar and we will be 
selecting few columns instead of all. Data will be processed and saved to 
partitions on which we will have hive external tables.

Will parquet be able to handle the following:

- Column renaming from between data

- Column removal from between

- DataType change of existing column (int to bigint should be allowed, right?)

Please advise. 

Thanks,

Sam

dev-unsubscr...@spark.apache.org

2017-05-30 Thread williamtellme123

From: Georg Heiler [mailto:georg.kf.hei...@gmail.com] 
Sent: Monday, May 29, 2017 2:23 PM
To: Spark Dev List 
Subject: Generic datasets implicit encoder missing

Hi,

Anyone knows what is wrong with using a generic 
https://stackoverflow.com/q/44247874/2587904 to construct a dataset? Even 
though the implicits are imported, they are missing. 

Regards Georg

user-unsubscr...@spark.apache.org

2017-05-26 Thread williamtellme123

user-unsubscr...@spark.apache.org

 

From: ANEESH .V.V [mailto:aneeshnair.ku...@gmail.com] 
Sent: Friday, May 26, 2017 1:50 AM
To: user@spark.apache.org
Subject: unsubscribe

 

unsubscribe

user-unsubscr...@spark.apache.org

2017-05-25 Thread williamtellme123

 

 

From: Steffen Schmitz [mailto:steffenschm...@hotmail.de] 
Sent: Thursday, May 25, 2017 3:34 AM
To: ramnavan 
Cc: user@spark.apache.org
Subject: Re: Questions regarding Jobs, Stages and Caching

user-unsubscr...@spark.apache.org

2017-05-23 Thread williamtellme123

From: Arun [mailto:arunbm...@gmail.com] 
Sent: Saturday, May 20, 2017 9:48 PM
To: user@spark.apache.org
Subject: Rmse recomender system

hi all..

I am new to machine learning.

i am working on recomender system. for training dataset RMSE is  0.08  while on 
test data its is 2.345

whats conclusion and what steps can i take to improve

Sent from Samsung tablet

user-unsubscr...@spark.apache.org

2017-05-23 Thread williamtellme123

 

 

From: Abir Chakraborty [mailto:abi...@247-inc.com] 
Sent: Sunday, May 21, 2017 4:17 AM
To: user@spark.apache.org
Subject: unsubscribe

 

unsubscribe

user-unsubscr...@spark.apache.org

2017-05-23 Thread williamtellme123

 

 

From: Bibudh Lahiri [mailto:bibudhlah...@gmail.com] 
Sent: Sunday, May 21, 2017 9:34 AM
To: user 
Subject: unsubscribe

 

unsubscribe

user-unsubscr...@spark.apache.org

2017-05-23 Thread williamtellme123

user-unsubscr...@spark.apache.org

From: 萝卜丝炒饭 [mailto:1427357...@qq.com] 
Sent: Sunday, May 21, 2017 8:15 PM
To: user 
Subject: Are tachyon and akka removed from 2.1.1 please

HI all,

Iread some paper about source code, the paper base on version 1.2.  they
refer the tachyon and akka.  When i read the 2.1code. I can not find the
code abiut akka and tachyon.

Are tachyon and akka removed from 2.1.1  please

unsubscribe

2017-05-20 Thread williamtellme123

unsubscribe

From: Abir Chakraborty [mailto:abi...@247-inc.com] 
Sent: Saturday, May 20, 2017 1:29 AM
To: user@spark.apache.org
Subject: unsubscribe

unsubscribe

2017-05-17 Thread williamtellme123

unsubscribe

From: Gerhard Fiedler [mailto:gfied...@algebraixdata.com] 
Sent: Wednesday, May 17, 2017 1:57 PM
To: dev@spark.apache.org
Subject: Any plans to release 1.6.4 and/or 2.0.3?

We’re interested in the Python 3.6 support that has been fixed in 2.1.1 and 
apparently has been backported to the unreleased versions 1.6.4 and 2.0.3.

Are there any plans to release these versions?

Thanks,

Gerhard

unsubscribe

2017-05-10 Thread williamtellme123

unsubscribe

From: Aaron Perrin [mailto:aper...@gravyanalytics.com] 
Sent: Tuesday, January 31, 2017 9:42 AM
To: user @spark 
Subject: Multiple quantile calculations

I want to calculate quantiles on two different columns.  I know that I can 
calculate them with two separate operations. However, for performance reasons, 
I'd like to calculate both with one operation. 

Is this possible to do this with the Dataset API? I'm assuming that it isn't. 
But, if it isn't, is it possible to calculate both in one pass, assuming that I 
made some code changes? I briefly looked at the approxQuantile code, but I 
haven't dug into the algorithm.

unsubscribe

2017-05-10 Thread williamtellme123

From: Aaron Jackson [mailto:ajack...@pobox.com] 
Sent: Tuesday, July 19, 2016 7:17 PM
To: user 
Subject: Heavy Stage Concentration - Ends With Failure

Hi,

I have a cluster with 15 nodes of which 5 are HDFS nodes.  I kick off a job 
that creates some 120 stages.  Eventually, the active and pending stages reduce 
down to a small bottleneck and it never fails... the tasks associated with the 
10 (or so) running tasks are always allocated to the same executor on the same 
host.

Sooner or later, it runs out of memory ... or some other resource.  It falls 
over and then they tasks are reallocated to another executor.

Why do we see such heavy concentration of tasks onto a single executor when 
other executors are free?  Were the tasks assigned to an executor when the job 
was decomposed into stages?

UNSUBSCRIBE PLEASE

2017-01-12 Thread williamtellme123

us...@kafka.apache.org;users-unsubscr...@kafka.apache.org;
users_unsubscr...@kafka.apache.org;
dev@kafka.apache.org; dev-unsubscr...@kafka.apache.org;
dev_unsubscr...@kafka.apache.org
-Original Message-
From: Raj Tanneru [mailto:rtann...@fanatics.com] 
Sent: Saturday, May 7, 2016 6:46 PM
To: us...@kafka.apache.org
Cc: dev@kafka.apache.org
Subject: Re: KAFKA-3112

Thanks Ismael and Tao. Appreciate it.

Sent from my iPhone

> On May 7, 2016, at 1:14 AM, Ismael Juma  wrote:
>
> Hi Raj and Tao,
>
> I just merged the KAFKA-3112 PR, so this issue will be fixed in 0.10.0.0.
>
> Thanks,
> Ismael
>
>> On Fri, May 6, 2016 at 7:47 PM, tao xiao  wrote:
>>
>> KAFKA-2657 is unresolved so you can safely assume it hasn't been fixed
yet.
>>
>>> On Fri, 6 May 2016 at 07:38 Raj Tanneru  wrote:
>>>
>>> Yeah it is a duplicate of KAFKA-2657. The question is how to check / 
>>> know if it is merged to 0.9.0.1 release. What are the options that I 
>>> have if I need this fix. How can I get patch for this on 0.8.2.1?
>>>
>>> Sent from my iPhone
>>>
 On May 6, 2016, at 12:06 AM, tao xiao  wrote:

 It said this is a duplication. This is the
 https://issues.apache.org/jira/browse/KAFKA-2657 that KAKFA-3112
>>> duplicates
 to.

> On Thu, 5 May 2016 at 22:13 Raj Tanneru 
>> wrote:
>
>
> Hi All,
> Does anyone know if KAFKA-3112 is merged to 0.9.0.1? Is there a 
> place
>> to
> check which version has this fix? Jira doesn't show fix versions.
>
> https://issues.apache.org/jira/browse/KAFKA-3112
>
>
> Thanks,
> Raj Tanneru
> Information contained in this e-mail message is confidential. This
>>> e-mail
> message is intended only for the personal use of the recipient(s)
>> named
> above. If you are not an intended recipient, do not read, 
> distribute
>> or
> reproduce this transmission (including any attachments). If you 
> have received this email in error, please immediately notify the 
> sender by
>>> email
> reply and delete the original message.
>>> Information contained in this e-mail message is confidential. This 
>>> e-mail message is intended only for the personal use of the 
>>> recipient(s) named above. If you are not an intended recipient, do 
>>> not read, distribute or reproduce this transmission (including any 
>>> attachments). If you have received this email in error, please 
>>> immediately notify the sender by
>> email
>>> reply and delete the original message.
>>
Information contained in this e-mail message is confidential. This e-mail
message is intended only for the personal use of the recipient(s) named
above. If you are not an intended recipient, do not read, distribute or
reproduce this transmission (including any attachments). If you have
received this email in error, please immediately notify the sender by email
reply and delete the original message.

UNSUBSCRIBE PLEASE

2017-01-12 Thread williamtellme123



-Original Message-
From: Gerard Klijs [mailto:gerard.kl...@dizzit.com] 
Sent: Wednesday, May 11, 2016 3:00 AM
To: users@kafka.apache.org
Subject: Re: Backing up Kafka data and using it later?

You could create a docker image with a kafka installation, and start a mirror 
maker in it, you could set the retention time for it to infinite, and mount the 
data volume. With the data you could always restart the docker, en mirror it to 
somewhere else. Not sure that's what you want, but it's an option to save data 
for use some other place/time.

On Wed, May 11, 2016 at 12:33 AM Alex Loddengaard  wrote:

> You may find this interesting, although I don't believe it's exactly 
> what you're looking for:
>
> https://github.com/pinterest/secor
>
> I'm not sure how stable and commonly used it is.
>
> Additionally, I see a lot of users use MirrorMaker for a "backup," 
> where MirrorMaker copies all topics from one Kafka cluster to another "backup"
> cluster. I put "backup" in quotes because this architecture doesn't 
> support snapshotting like a traditional backup would. I realize this 
> doesn't address your specific use case, but thought you may find it 
> interesting regardless.
>
> Sorry I'm a little late to the thread, too.
>
> Alex
>
> On Thu, May 5, 2016 at 7:05 AM, Rad Gruchalski 
> wrote:
>
> > John,
> >
> > I’m not as expert expert in Kafka but I would assume so.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Best regards,
> > Radek Gruchalski
> > ra...@gruchalski.com (mailto:ra...@gruchalski.com) (mailto:
> > ra...@gruchalski.com)
> > de.linkedin.com/in/radgruchalski/ (
> > http://de.linkedin.com/in/radgruchalski/)
> >
> > Confidentiality:
> > This communication is intended for the above-named person and may be 
> > confidential and/or legally privileged.
> > If it has come to you in error you must take no action based on it, 
> > nor must you copy or show it to anyone; please delete/destroy and 
> > inform the sender immediately.
> >
> >
> >
> > On Thursday, 5 May 2016 at 01:46, John Bickerstaff wrote:
> >
> > > Thanks - does that mean that the only way to safely back up Kafka 
> > > is to have replication?
> > >
> > > (I have done this partially - I can get the entire topic on the 
> > > command line, after completely recreating the server, but my code 
> > > that is
> > intended
> > > to do the same thing just hangs)
> > >
> > > On Wed, May 4, 2016 at 3:18 PM, Rad Gruchalski 
> > >  > (mailto:ra...@gruchalski.com)> wrote:
> > >
> > > > John,
> > > >
> > > > I believe you mean something along the lines of:
> > > > http://markmail.org/message/f7xb5okr3ujkplk4
> > > > I don’t think something like this has been done.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Best regards,
> > > > Radek Gruchalski
> > > > ra...@gruchalski.com (mailto:ra...@gruchalski.com) (mailto:
> > > > ra...@gruchalski.com (mailto:ra...@gruchalski.com)) 
> > > > de.linkedin.com/in/radgruchalski/ (
> > http://de.linkedin.com/in/radgruchalski/) (
> > > > http://de.linkedin.com/in/radgruchalski/)
> > > >
> > > > Confidentiality:
> > > > This communication is intended for the above-named person and 
> > > > may be confidential and/or legally privileged.
> > > > If it has come to you in error you must take no action based on 
> > > > it,
> nor
> > > > must you copy or show it to anyone; please delete/destroy and 
> > > > inform
> > the
> > > > sender immediately.
> > > >
> > > >
> > > >
> > > > On Wednesday, 4 May 2016 at 23:04, John Bickerstaff wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have what is probably an edge use case. I'd like to back up 
> > > > > a
> > single
> > > > > Kafka instance such that I can recreate a new server, drop 
> > > > > Kafka
> in,
> > drop
> > > > > the data in, start Kafka -- and have all my data ready to go 
> > > > > again
> > for
> > > > > consumers.
> > > > >
> > > > > Is such a thing done? Does anyone have any experience trying this?
> > > > >
> > > > > I have, and I've run into some problems which suggest there's 
> > > > > a
> > setting
> > > > or
> > > > > some other thing I'm unaware of...
> > > > >
> > > > > If you like, don't think of it as a backup problem so much as 
> > > > > a
> > "cloning"
> > > > > problem. I want to clone a new Kafka machine without actually
> > cloning it
> > > > >
> > > >
> > > > -
> > > > > I.E. the data is somewhere else (log and index files) although
> > Zookeeper
> > > >
> > > > is
> > > > > up and running just fine.
> > > > >
> > > > > Thanks
> >
> >
>

UNSUBSCRIBE PLEASE

2017-01-12 Thread williamtellme123

users@kafka.apache.org;users-unsubscr...@kafka.apache.org;
users_unsubscr...@kafka.apache.org;
d...@kafka.apache.org; dev-unsubscr...@kafka.apache.org;
dev_unsubscr...@kafka.apache.org
-Original Message-
From: Raj Tanneru [mailto:rtann...@fanatics.com] 
Sent: Saturday, May 7, 2016 6:46 PM
To: users@kafka.apache.org
Cc: d...@kafka.apache.org
Subject: Re: KAFKA-3112

Thanks Ismael and Tao. Appreciate it.

Sent from my iPhone

> On May 7, 2016, at 1:14 AM, Ismael Juma  wrote:
>
> Hi Raj and Tao,
>
> I just merged the KAFKA-3112 PR, so this issue will be fixed in 0.10.0.0.
>
> Thanks,
> Ismael
>
>> On Fri, May 6, 2016 at 7:47 PM, tao xiao  wrote:
>>
>> KAFKA-2657 is unresolved so you can safely assume it hasn't been fixed
yet.
>>
>>> On Fri, 6 May 2016 at 07:38 Raj Tanneru  wrote:
>>>
>>> Yeah it is a duplicate of KAFKA-2657. The question is how to check / 
>>> know if it is merged to 0.9.0.1 release. What are the options that I 
>>> have if I need this fix. How can I get patch for this on 0.8.2.1?
>>>
>>> Sent from my iPhone
>>>
 On May 6, 2016, at 12:06 AM, tao xiao  wrote:

 It said this is a duplication. This is the
 https://issues.apache.org/jira/browse/KAFKA-2657 that KAKFA-3112
>>> duplicates
 to.

> On Thu, 5 May 2016 at 22:13 Raj Tanneru 
>> wrote:
>
>
> Hi All,
> Does anyone know if KAFKA-3112 is merged to 0.9.0.1? Is there a 
> place
>> to
> check which version has this fix? Jira doesn't show fix versions.
>
> https://issues.apache.org/jira/browse/KAFKA-3112
>
>
> Thanks,
> Raj Tanneru
> Information contained in this e-mail message is confidential. This
>>> e-mail
> message is intended only for the personal use of the recipient(s)
>> named
> above. If you are not an intended recipient, do not read, 
> distribute
>> or
> reproduce this transmission (including any attachments). If you 
> have received this email in error, please immediately notify the 
> sender by
>>> email
> reply and delete the original message.
>>> Information contained in this e-mail message is confidential. This 
>>> e-mail message is intended only for the personal use of the 
>>> recipient(s) named above. If you are not an intended recipient, do 
>>> not read, distribute or reproduce this transmission (including any 
>>> attachments). If you have received this email in error, please 
>>> immediately notify the sender by
>> email
>>> reply and delete the original message.
>>
Information contained in this e-mail message is confidential. This e-mail
message is intended only for the personal use of the recipient(s) named
above. If you are not an intended recipient, do not read, distribute or
reproduce this transmission (including any attachments). If you have
received this email in error, please immediately notify the sender by email
reply and delete the original message.

UNSUBSCRIBE PLEASE

2017-01-12 Thread williamtellme123


-Original Message-
From: Kuldeep Kamboj [mailto:kuldeep.kam...@osscube.com] 
Sent: Monday, May 2, 2016 11:29 PM
To: users@kafka.apache.org
Subject: Getting Timed out reading socket error for kafka cluster setup

Hi,

I want to setup a kafka cluster type setup for three similar application having 
same queues like AppA -> {TopicX, TopicY, TopicZ},  AppB -> {TopicX, TopicZ}, 
AppC -> {TopicX, TopicY}. Producer and Consumer will be App specific.
I setup kafka cluster with three brokers having partition 1,2,3 in three 
different config files with different ports. Then start kafka server ( cluster )

I am using kafka php wrapper by http://github.com/nmred/kafka-php

So I used Producer code for App A like

   $producer->setRequireAck(-1);
   $producer->setMessages("TopicX", 0, array(json_encode($this->data)));
   $producer->send();

AND used Producer code for App B like

   $producer->setRequireAck(-1);
   $producer->setMessages("TopicX", 1, array(json_encode($this->data)));
   $producer->send();

And So On.

Then I made my Consumer scripts for three apps like

$queues = array("TopicX", "TopicY", "TopicZ");
while(true) {
foreach($queues as $queue) {
$consumer =
\Kafka\Consumer::getInstance('localhost:2181');
$consumer->setGroup('testgroup');
$consumer->setPartition($queue, 0);
$result = $consumer->fetch();
   }
}

But when I try to execute consumer script for any App I get error like

**"Timed out reading socket while reading 750437 bytes with 750323 bytes to
go"**

I just don't know How I can fix this issue I tried to modify some kafka config 
parameters like

 zookeeper.connection.timeout.ms=24000 # Initially 6000
 replica.socket.timeout.ms=15000  # Not exists in
default file

but that not worked.


--
Kuldeep Kamboj

Unsubscribe

2017-01-12 Thread williamtellme123

Please unsubscribe me

From: Varanasi, Venkata [mailto:venkata.varan...@bankofamerica.com] 
Sent: Thursday, April 28, 2016 1:35 PM
To: dev@spark.apache.org
Subject: Unsubscribe

  _  

This message, and any attachments, is for the intended recipient(s) only,
may contain information that is privileged, confidential and/or proprietary
and subject to important terms and conditions available at
http://www.bankofamerica.com/emaildisclaimer. If you are not the intended
recipient, please delete this message.

UNSUBSCRIBE

2017-01-12 Thread williamtellme123

 

 

From: Harjit Singh [mailto:harjit.si...@deciphernow.com] 
Sent: Tuesday, April 26, 2016 3:11 PM
To: user@spark.apache.org
Subject: test

Re: Initializing StateStores takes really long for large datasets

2016-12-03 Thread williamtellme123

Unsubscribe


Sent via the Samsung Galaxy S7, an AT 4G LTE smartphone
 Original message From: Guozhang Wang  
Date: 12/2/16  5:13 PM  (GMT-06:00) To: users@kafka.apache.org Subject: Re: 
Initializing StateStores takes *really* long for large datasets 
Before we have the a single-knob memory management feature, I'd like to
propose reducing the Streams' default config values for RocksDB caching and
memory block size. For example, I remember Henry has done some fine tuning
on the RocksDB config for his use case:

https://github.com/HenryCaiHaiying/kafka/commit/b297f7c585f5a883ee068277e5f0f1224c347bd4
https://github.com/HenryCaiHaiying/kafka/commit/eed1726d16e528d813755a6e66b49d0bf14e8803
https://github.com/HenryCaiHaiying/kafka/commit/ccc4e25b110cd33eea47b40a2f6bf17ba0924576



We could check if some of those changes are appropriate in general and if
yes change the default settings accordingly.

Henry

On Wed, Nov 30, 2016 at 11:04 AM, Ara Ebrahimi 
wrote:

> +1 on this.
>
> Ara.
>
> > On Nov 30, 2016, at 5:18 AM, Mathieu Fenniak <
> mathieu.fenn...@replicon.com> wrote:
> >
> > I'd like to quickly reinforce Frank's opinion regarding the rocksdb
> memory
> > usage.  I was also surprised by the amount of non-JVM-heap memory being
> > used and had to tune the 100 MB default down considerably.  It's also
> > unfortunate that it's hard to estimate the memory requirements for a KS
> app
> > because of this.  If you have ten stores, and assuming the default
> config,
> > you'd need a GB of memory for the rocksdb cache if you run 1 app, but
> only
> > half a GB if you run two app instances because the stores will be
> > distributed.
> >
> > It would be much nicer to be able to give KS a fixed amount of memory in
> a
> > config that it divided among the active stores on a node.  Configure it
> > with N GB; if a rebalance adds more tasks and stores, they each get less
> > RAM; if a rebalance removes tasks and stores, the remaining stores get
> more
> > RAM.  It seems like it'd be hard to do this with the RocksDBConfigSetter
> > interface because it doesn't get any state about the KS topology to make
> > decisions; which are arguably not config, but tuning / performance
> > decisions.
> >
> > Mathieu
> >
> >
> >
> > On Mon, Nov 28, 2016 at 3:45 PM, Frank Lyaruu  wrote:
> >
> >> I'll write an update on where I am now.
> >>
> >> I've got about 40 'primary' topics, some small, some up to about 10M
> >> messages,
> >> and about 30 internal topics, divided over 6 stream instances, all
> running
> >> in a single
> >> app, talking to a 3 node Kafka cluster.
> >>
> >> I use a single thread per stream instance, as my prime concern is now to
> >> get it
> >> to run stable, rather than optimizing performance.
> >>
> >> My biggest issue was that after a few hours my application started to
> slow
> >> down
> >> to ultimately freeze up or crash. It turned out that RocksDb consumed
> all
> >> my
> >> memory, which I overlooked as it was off-heap.
> >>
> >> I was fooling around with RocksDb settings a bit but I had missed the
> most
> >> important
> >> one:
> >>
> >>    BlockBasedTableConfig tableConfig = new BlockBasedTableConfig();
> >>    tableConfig.setBlockCacheSize(BLOCK_CACHE_SIZE);
> >>    tableConfig.setBlockSize(BLOCK_SIZE);
> >>    options.setTableFormatConfig(tableConfig);
> >>
> >> The block cache size defaults to a whopping 100Mb per store, and that
> gets
> >> expensive
> >> fast. I reduced it to a few megabytes. My data size is so big that I
> doubt
> >> it is very effective
> >> anyway. Now it seems more stable.
> >>
> >> I'd say that a smaller default makes sense, especially because the
> failure
> >> case is
> >> so opaque (running all tests just fine but with a serious dataset it
> dies
> >> slowly)
> >>
> >> Another thing I see is that while starting all my instances, some are
> quick
> >> and some take
> >> time (makes sense as the data size varies greatly), but as more
> instances
> >> start up, they
> >> start to use more and more CPU I/O and network, that the initialization
> of
> >> the bigger ones
> >> takes even longer, increasing the chance that one of them takes longer
> than
> >> the
> >> MAX_POLL_INTERVAL_MS_CONFIG, and then all hell breaks loose. Maybe we
> can
> >> separate the 'initialize' and 'start' step somehow.
> >>
> >> In this case we could log better: If initialization is taking longer
> than
> >> the timeout, it ends up
> >> being reassigned (in my case to the same instance) and then it errors
> out
> >> on being unable
> >> to lock the state dir. That message isn't too informative as the
> timeout is
> >> the actual problem.
> >>
> >> regards, Frank
> >>
> >>
> >> On Mon, Nov 28, 2016 at 8:01 PM, Guozhang Wang 
> wrote:
> >>
> >>> Hello Frank,
> >>>
> >>> How many instances do you have in your apps and how many threads did
> you
> >>> use per thread? Note that besides the

Re: Kafka windowed table not aggregating correctly

2016-12-03 Thread williamtellme123

Unsubscribe


Sent via the Samsung Galaxy S7, an AT 4G LTE smartphone
 Original message From: Guozhang Wang  
Date: 12/2/16  5:48 PM  (GMT-06:00) To: users@kafka.apache.org Subject: Re: 
Kafka windowed table not aggregating correctly 
Sachin,

One thing to note is that the retention of the windowed stores works by
keeping multiple segments of the stores where each segments stores a time
range which can potentially span multiple windows, if a new window needs to
be created that is further from the oldest segment's time range + retention
period (from your code it seems you do not override it from
TimeWindows.of("stream-table",
10 * 1000L).advanceBy(5 * 1000L), via until(...)), so the default of one
day is used.

So with WallclockTimeExtractor since it is using system time, it wont give
you timestamps that span for more than a day during a short period of time,
but if your own defined timestamps expand that value, then old segments
will be dropped immediately and hence the aggregate values will be returned
as a single value.

Guozhang


On Fri, Dec 2, 2016 at 11:58 AM, Matthias J. Sax 
wrote:

> The extractor is used in
>
> org.apache.kafka.streams.processor.internals.RecordQueue#addRawRecords()
>
> Let us know, if you could resolve the problem or need more help.
>
> -Matthias
>
> On 12/2/16 11:46 AM, Sachin Mittal wrote:
> > https://github.com/SOHU-Co/kafka-node/ this is the node js client i am
> > using. The version is 0.5x. Can you please tell me what code in streams
> > calls the timestamp extractor. I can look there to see if there is any
> > issue.
> >
> > Again issue happens only when producing the messages using producer that
> is
> > compatible with kafka version 0.8x. I see that this producer does not
> send
> > a record timestamp as this was introduced in version 0.10 only.
> >
> > Thanks
> > Sachin
> >
> > On 3 Dec 2016 1:03 a.m., "Matthias J. Sax" 
> wrote:
> >
> >> I am not sure what is happening. That's why it would be good to have a
> >> toy example to reproduce the issue.
> >>
> >> What do you mean by "Kafka node version 0.5"?
> >>
> >> -Matthias
> >>
> >> On 12/2/16 11:30 AM, Sachin Mittal wrote:
> >>> I can provide with the data but data does not seem to be the issue.
> >>> If I submit the same data and use same timestamp extractor  using the
> >> java
> >>> client with kafka version 0.10.0.1 aggregation works fine.
> >>> I find the issue only when submitting the data with kafka node version
> >> 0.5.
> >>> It looks like the stream does not extract the time correctly in that
> >> case.
> >>>
> >>> Thanks
> >>> Sachin
> >>>
> >>> On 2 Dec 2016 11:41 p.m., "Matthias J. Sax" 
> >> wrote:
> >>>
>  Can you provide example input data (including timetamps) and result.
>  What is the expected result (ie, what aggregation do you apply)?
> 
> 
>  -Matthias
> 
>  On 12/2/16 7:43 AM, Sachin Mittal wrote:
> > Hi,
> > After much debugging I found an issue with timestamp extractor.
> >
> > If I use a custom timestamp extractor with following code:
> > public static class MessageTimestampExtractor implements
> > TimestampExtractor {
> > public long extract(ConsumerRecord record) {
> > if (record.value() instanceof Message) {
> > return ((Message) record.value()).ts;
> > } else {
> > return record.timestamp();
> > }
> > }
> > }
> >
> > Here message has a long field ts which stores the timestamp, the
> > aggregation does not work.
> > Note I have checked and ts has valid timestamp values.
> >
> > However if I replace it with say WallclockTimestampExtractor
> >> aggregation
>  is
> > working fine.
> >
> > I do not understand what could be the issue here.
> >
> > Also note I am using kafka streams version 0.10.0.1 and I am
> publishing
> > messages via
> > https://github.com/SOHU-Co/kafka-node/ whose version is quite old
> >> 0.5.x
> >
> > Let me know if there is some bug in time stamp extractions.
> >
> > Thanks
> > Sachin
> >
> >
> >
> > On Mon, Nov 28, 2016 at 11:52 PM, Guozhang Wang 
>  wrote:
> >
> >> Sachin,
> >>
> >> This is indeed a bit wired, and we'd like to try to re-produce your
>  issue
> >> locally. Do you have a sample input data for us to try out?
> >>
> >> Guozhang
> >>
> >> On Fri, Nov 25, 2016 at 10:12 PM, Sachin Mittal  >
> >> wrote:
> >>
> >>> Hi,
> >>> I fixed that sorted set issue but I am facing a weird problem
> which I
>  am
> >>> not able to replicate.
> >>>
> >>> Here is the sample problem that I could isolate:
> >>> My class is like this:
> >>> public static class Message

UNSUBSCRIBE

2015-12-11 Thread williamtellme123

UNSUBSCRIBE

2015-12-11 Thread williamtellme123

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

dev-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

unsubscribe

unsubscribe

unsubscribe

unsubscribe

UNSUBSCRIBE PLEASE

UNSUBSCRIBE PLEASE

UNSUBSCRIBE PLEASE

UNSUBSCRIBE PLEASE

Unsubscribe

UNSUBSCRIBE

Re: Initializing StateStores takes really long for large datasets

Re: Kafka windowed table not aggregating correctly

UNSUBSCRIBE

UNSUBSCRIBE

25 matches

Site Navigation

Mail list logo

Footer information