Re: [ANN] durable-queue: an in-process disk-backed queue

2014-03-10 Thread Zach Tellman
Hey Leif,

When using :fsync-interval, the actual calls to .force() on the underlying 
ByteBuffers occur on another thread, making it effectively a background 
process.  This is contrasted with :fsync-threshold, which will 
synchronously call fsync when the write threshold is hit.  Note that if the 
fsync-interval is less than the time it takes to actually fsync, it will 
simply continuously fsync at whatever pace it can.

If you'd like to verify, I suggest using strace.

Zach

On Sunday, March 9, 2014 6:54:50 PM UTC-7, Leif wrote:
>
> Hi, Zach.
>
> I was trying to benchmark at different values of the :fysnc-* parameters, 
> and I noticed that it didn't matter what value of :fsync-interval I set, 
> the performance was constant, and about what it is with both :fsync-put? 
> and :fsync-take? disabled.
>
> Any suggestions on how to test if data is actually being synced to disk at 
> my specified interval?
>
> Please forgive my suspicious nature,
> Leif
>
> On Friday, March 7, 2014 4:21:44 PM UTC-5, Zach Tellman wrote:
>>
>> I added the above-described features a few weeks back, but only got 
>> around to marking 0.1.1 today.  Fsync batching is described at the end of 
>> the README, let me know if you have any questions.
>>
>> On Friday, February 7, 2014 11:52:11 AM UTC-8, Zach Tellman wrote:
>>>
>>> Hi Bob,
>>>
>>> Right now the API only allows for single puts, and fsyncing is 
>>> all-or-nothing.  However, this is just an artifact of my major use case for 
>>> the library, which relies on upstream batching of tasks.  I'm planning an 
>>> 0.1.1 release which has an explicit `sync` method, and support for 
>>> sync-intervals (i.e. sync twice a second) and sync-thresholds (i.e. sync 
>>> every ten puts or takes).  The use case you describe could be achieved by 
>>> disabling automatic syncing, and doing a series of puts and takes followed 
>>> by a call to `sync`.
>>>
>>> If you have thoughts or suggestions on how this can be more useful for 
>>> you, please let me know.
>>>
>>> Zach
>>>
>>>
>>> On Fri, Feb 7, 2014 at 5:26 AM, Bob Hutchison wrote:
>>>

 On Feb 6, 2014, at 6:45 PM, Zach Tellman  wrote:

 At Factual we get a lot of data thrown at us, and often don't have 
 control over the rate at which it comes in.  As such, it's preferable that 
 our buffer isn't bounded by the process' memory, since a temporary blip in 
 throughput may cause GC pauses, OOM exceptions, and other things that will 
 only exacerbate the problem.  It's also preferable that if the process 
 dies, we won't lose any data which hasn't yet escaped the process.  A 
 disk-backed queue satisfies both of these requirements.

 As such, I'm happy to announce that we're open sourcing 
 'durable-queue': https://github.com/Factual/durable-queue.  It's a 
 small, fast, pure-Clojure implementation that in our production systems is 
 responsible for processing billions of entries daily.  We believe it has 
 broad applications, and are excited to see how others will use it.


 What excellent timing! I’ve been looking at ZeroMQ, RabbitMQ, and Kafka 
 for the last week or so. ZMQ is awfully attractive for what I’m trying to 
 do, but there are a few things it doesn’t do that I need done. I had begun 
 thinking of building something similar on top of Redis.

 You mention the idea of batching to reduce the impact of fsync. Is 
 there an API for batching puts? Is there a way to batch a complete! and 
 put! new tasks to the queue?

 One pattern that keeps coming up is:
- take a single task from the queue
- execute the task, which might generate a set of new tasks to be 
 queued on the same queue (and likely on other queues too)
- signal completion, and put the new tasks

 Cheers,
 Bob


 Zach

 P.S. If this sort of work is interesting to you, Factual is hiring: 
 https://groups.google.com/forum/#!searchin/clojure/factual/clojure/8bPIEnNpfyQ/lvv-9gkVozAJ

 -- 
 You received this message because you are subscribed to the Google
 Groups "Clojure" group.
 To post to this group, send email to clo...@googlegroups.com
 Note that posts from new members are moderated - please be patient with 
 your first post.
 To unsubscribe from this group, send email to
 clojure+u...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 --- 
 You received this message because you are subscribed to the Google 
 Groups "Clojure" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to clojure+u...@googlegroups.com.

 For more options, visit https://groups.google.com/groups/opt_out.


  -- 
 You received this message because you are subscribed to the Google
 Groups "Clojure" group.
 To post to this group, send email to clo...@go

Re: [ANN] durable-queue: an in-process disk-backed queue

2014-03-09 Thread Leif
Hi, Zach.

I was trying to benchmark at different values of the :fysnc-* parameters, 
and I noticed that it didn't matter what value of :fsync-interval I set, 
the performance was constant, and about what it is with both :fsync-put? 
and :fsync-take? disabled.

Any suggestions on how to test if data is actually being synced to disk at 
my specified interval?

Please forgive my suspicious nature,
Leif

On Friday, March 7, 2014 4:21:44 PM UTC-5, Zach Tellman wrote:
>
> I added the above-described features a few weeks back, but only got around 
> to marking 0.1.1 today.  Fsync batching is described at the end of the 
> README, let me know if you have any questions.
>
> On Friday, February 7, 2014 11:52:11 AM UTC-8, Zach Tellman wrote:
>>
>> Hi Bob,
>>
>> Right now the API only allows for single puts, and fsyncing is 
>> all-or-nothing.  However, this is just an artifact of my major use case for 
>> the library, which relies on upstream batching of tasks.  I'm planning an 
>> 0.1.1 release which has an explicit `sync` method, and support for 
>> sync-intervals (i.e. sync twice a second) and sync-thresholds (i.e. sync 
>> every ten puts or takes).  The use case you describe could be achieved by 
>> disabling automatic syncing, and doing a series of puts and takes followed 
>> by a call to `sync`.
>>
>> If you have thoughts or suggestions on how this can be more useful for 
>> you, please let me know.
>>
>> Zach
>>
>>
>> On Fri, Feb 7, 2014 at 5:26 AM, Bob Hutchison wrote:
>>
>>>
>>> On Feb 6, 2014, at 6:45 PM, Zach Tellman  wrote:
>>>
>>> At Factual we get a lot of data thrown at us, and often don't have 
>>> control over the rate at which it comes in.  As such, it's preferable that 
>>> our buffer isn't bounded by the process' memory, since a temporary blip in 
>>> throughput may cause GC pauses, OOM exceptions, and other things that will 
>>> only exacerbate the problem.  It's also preferable that if the process 
>>> dies, we won't lose any data which hasn't yet escaped the process.  A 
>>> disk-backed queue satisfies both of these requirements.
>>>
>>> As such, I'm happy to announce that we're open sourcing 'durable-queue': 
>>> https://github.com/Factual/durable-queue.  It's a small, fast, 
>>> pure-Clojure implementation that in our production systems is responsible 
>>> for processing billions of entries daily.  We believe it has broad 
>>> applications, and are excited to see how others will use it.
>>>
>>>
>>> What excellent timing! I’ve been looking at ZeroMQ, RabbitMQ, and Kafka 
>>> for the last week or so. ZMQ is awfully attractive for what I’m trying to 
>>> do, but there are a few things it doesn’t do that I need done. I had begun 
>>> thinking of building something similar on top of Redis.
>>>
>>> You mention the idea of batching to reduce the impact of fsync. Is there 
>>> an API for batching puts? Is there a way to batch a complete! and put! new 
>>> tasks to the queue?
>>>
>>> One pattern that keeps coming up is:
>>>- take a single task from the queue
>>>- execute the task, which might generate a set of new tasks to be 
>>> queued on the same queue (and likely on other queues too)
>>>- signal completion, and put the new tasks
>>>
>>> Cheers,
>>> Bob
>>>
>>>
>>> Zach
>>>
>>> P.S. If this sort of work is interesting to you, Factual is hiring: 
>>> https://groups.google.com/forum/#!searchin/clojure/factual/clojure/8bPIEnNpfyQ/lvv-9gkVozAJ
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to clojure+u...@googlegroups.com.
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "Clojure" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/clojure/4tZFWdMKvjw/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> clojure+u...@googlegroups.com.
>>> For more options, vis

Re: [ANN] durable-queue: an in-process disk-backed queue

2014-03-07 Thread Zach Tellman
I added the above-described features a few weeks back, but only got around 
to marking 0.1.1 today.  Fsync batching is described at the end of the 
README, let me know if you have any questions.

On Friday, February 7, 2014 11:52:11 AM UTC-8, Zach Tellman wrote:
>
> Hi Bob,
>
> Right now the API only allows for single puts, and fsyncing is 
> all-or-nothing.  However, this is just an artifact of my major use case for 
> the library, which relies on upstream batching of tasks.  I'm planning an 
> 0.1.1 release which has an explicit `sync` method, and support for 
> sync-intervals (i.e. sync twice a second) and sync-thresholds (i.e. sync 
> every ten puts or takes).  The use case you describe could be achieved by 
> disabling automatic syncing, and doing a series of puts and takes followed 
> by a call to `sync`.
>
> If you have thoughts or suggestions on how this can be more useful for 
> you, please let me know.
>
> Zach
>
>
> On Fri, Feb 7, 2014 at 5:26 AM, Bob Hutchison 
> 
> > wrote:
>
>>
>> On Feb 6, 2014, at 6:45 PM, Zach Tellman > 
>> wrote:
>>
>> At Factual we get a lot of data thrown at us, and often don't have 
>> control over the rate at which it comes in.  As such, it's preferable that 
>> our buffer isn't bounded by the process' memory, since a temporary blip in 
>> throughput may cause GC pauses, OOM exceptions, and other things that will 
>> only exacerbate the problem.  It's also preferable that if the process 
>> dies, we won't lose any data which hasn't yet escaped the process.  A 
>> disk-backed queue satisfies both of these requirements.
>>
>> As such, I'm happy to announce that we're open sourcing 'durable-queue': 
>> https://github.com/Factual/durable-queue.  It's a small, fast, 
>> pure-Clojure implementation that in our production systems is responsible 
>> for processing billions of entries daily.  We believe it has broad 
>> applications, and are excited to see how others will use it.
>>
>>
>> What excellent timing! I’ve been looking at ZeroMQ, RabbitMQ, and Kafka 
>> for the last week or so. ZMQ is awfully attractive for what I’m trying to 
>> do, but there are a few things it doesn’t do that I need done. I had begun 
>> thinking of building something similar on top of Redis.
>>
>> You mention the idea of batching to reduce the impact of fsync. Is there 
>> an API for batching puts? Is there a way to batch a complete! and put! new 
>> tasks to the queue?
>>
>> One pattern that keeps coming up is:
>>- take a single task from the queue
>>- execute the task, which might generate a set of new tasks to be 
>> queued on the same queue (and likely on other queues too)
>>- signal completion, and put the new tasks
>>
>> Cheers,
>> Bob
>>
>>
>> Zach
>>
>> P.S. If this sort of work is interesting to you, Factual is hiring: 
>> https://groups.google.com/forum/#!searchin/clojure/factual/clojure/8bPIEnNpfyQ/lvv-9gkVozAJ
>>
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com 
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to clojure+u...@googlegroups.com .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com 
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Clojure" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/clojure/4tZFWdMKvjw/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> clojure+u...@googlegroups.com .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group

Re: [ANN] durable-queue: an in-process disk-backed queue

2014-02-07 Thread Zach Tellman
Hi Bob,

Right now the API only allows for single puts, and fsyncing is
all-or-nothing.  However, this is just an artifact of my major use case for
the library, which relies on upstream batching of tasks.  I'm planning an
0.1.1 release which has an explicit `sync` method, and support for
sync-intervals (i.e. sync twice a second) and sync-thresholds (i.e. sync
every ten puts or takes).  The use case you describe could be achieved by
disabling automatic syncing, and doing a series of puts and takes followed
by a call to `sync`.

If you have thoughts or suggestions on how this can be more useful for you,
please let me know.

Zach


On Fri, Feb 7, 2014 at 5:26 AM, Bob Hutchison wrote:

>
> On Feb 6, 2014, at 6:45 PM, Zach Tellman  wrote:
>
> At Factual we get a lot of data thrown at us, and often don't have control
> over the rate at which it comes in.  As such, it's preferable that our
> buffer isn't bounded by the process' memory, since a temporary blip in
> throughput may cause GC pauses, OOM exceptions, and other things that will
> only exacerbate the problem.  It's also preferable that if the process
> dies, we won't lose any data which hasn't yet escaped the process.  A
> disk-backed queue satisfies both of these requirements.
>
> As such, I'm happy to announce that we're open sourcing 'durable-queue':
> https://github.com/Factual/durable-queue.  It's a small, fast,
> pure-Clojure implementation that in our production systems is responsible
> for processing billions of entries daily.  We believe it has broad
> applications, and are excited to see how others will use it.
>
>
> What excellent timing! I've been looking at ZeroMQ, RabbitMQ, and Kafka
> for the last week or so. ZMQ is awfully attractive for what I'm trying to
> do, but there are a few things it doesn't do that I need done. I had begun
> thinking of building something similar on top of Redis.
>
> You mention the idea of batching to reduce the impact of fsync. Is there
> an API for batching puts? Is there a way to batch a complete! and put! new
> tasks to the queue?
>
> One pattern that keeps coming up is:
>- take a single task from the queue
>- execute the task, which might generate a set of new tasks to be
> queued on the same queue (and likely on other queues too)
>- signal completion, and put the new tasks
>
> Cheers,
> Bob
>
>
> Zach
>
> P.S. If this sort of work is interesting to you, Factual is hiring:
> https://groups.google.com/forum/#!searchin/clojure/factual/clojure/8bPIEnNpfyQ/lvv-9gkVozAJ
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Clojure" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/clojure/4tZFWdMKvjw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [ANN] durable-queue: an in-process disk-backed queue

2014-02-07 Thread Bob Hutchison

On Feb 6, 2014, at 6:45 PM, Zach Tellman  wrote:

> At Factual we get a lot of data thrown at us, and often don't have control 
> over the rate at which it comes in.  As such, it's preferable that our buffer 
> isn't bounded by the process' memory, since a temporary blip in throughput 
> may cause GC pauses, OOM exceptions, and other things that will only 
> exacerbate the problem.  It's also preferable that if the process dies, we 
> won't lose any data which hasn't yet escaped the process.  A disk-backed 
> queue satisfies both of these requirements.
> 
> As such, I'm happy to announce that we're open sourcing 'durable-queue': 
> https://github.com/Factual/durable-queue.  It's a small, fast, pure-Clojure 
> implementation that in our production systems is responsible for processing 
> billions of entries daily.  We believe it has broad applications, and are 
> excited to see how others will use it.

What excellent timing! I've been looking at ZeroMQ, RabbitMQ, and Kafka for the 
last week or so. ZMQ is awfully attractive for what I'm trying to do, but there 
are a few things it doesn't do that I need done. I had begun thinking of 
building something similar on top of Redis.

You mention the idea of batching to reduce the impact of fsync. Is there an API 
for batching puts? Is there a way to batch a complete! and put! new tasks to 
the queue?

One pattern that keeps coming up is:
   - take a single task from the queue
   - execute the task, which might generate a set of new tasks to be queued on 
the same queue (and likely on other queues too)
   - signal completion, and put the new tasks

Cheers,
Bob

> 
> Zach
> 
> P.S. If this sort of work is interesting to you, Factual is hiring: 
> https://groups.google.com/forum/#!searchin/clojure/factual/clojure/8bPIEnNpfyQ/lvv-9gkVozAJ
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[ANN] durable-queue: an in-process disk-backed queue

2014-02-06 Thread Zach Tellman
At Factual we get a lot of data thrown at us, and often don't have control 
over the rate at which it comes in.  As such, it's preferable that our 
buffer isn't bounded by the process' memory, since a temporary blip in 
throughput may cause GC pauses, OOM exceptions, and other things that will 
only exacerbate the problem.  It's also preferable that if the process 
dies, we won't lose any data which hasn't yet escaped the process.  A 
disk-backed queue satisfies both of these requirements.

As such, I'm happy to announce that we're open sourcing 'durable-queue': 
https://github.com/Factual/durable-queue.  It's a small, fast, pure-Clojure 
implementation that in our production systems is responsible for processing 
billions of entries daily.  We believe it has broad applications, and are 
excited to see how others will use it.

Zach

P.S. If this sort of work is interesting to you, Factual is 
hiring: 
https://groups.google.com/forum/#!searchin/clojure/factual/clojure/8bPIEnNpfyQ/lvv-9gkVozAJ

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.