You could use an interval tree instead of a list. Before inserting into it,
you could check to see if your current chunk number is contiguous with an
existing interval in the tree. If so, merge the chunks and expand the
existing interval. Otherwise, insert an interval of size 1 into the tree.

Stephen Chang's ftree package might be useful here.

- Jon



On Fri, Dec 2, 2016 at 2:39 PM, David Storrs <david.sto...@gmail.com> wrote:

> This is a more business-logic as opposed to syntax/technique question than
> usually shows up on this list, but hopefully folks won't mind.
>
> I started off to ask "what's the best way to find contiguous sequences of
> numbers in a list?" and then realized that maybe I should answer the "What
> are you trying to achieve?" question first.
>
> Short version:
> Tactical problem:  Given a list of sorted numbers, how best to identify
> contiguous sequences within the list?
> Strategic problem:   How best to put a chunked-up file back together given
> a (possibly incomplete) set of chunks and the index number of those chunks?
>
> Long version:
> I have a file that's been broken up into chunks of roughly standard size
> and I want to put the file back together.  I have the chunk number for each
> chunk so I know what order they should be concatenated in.  I may not have
> all the chunks, but I don't want to wait for all of them to arrive before
> starting the assembly, so I'll need to be able to be able to generate
> intermediate products and add to them later when the assembly process is
> re-run.
>
> My database has filepath and chunk-number data for all chunks (even the
> ones that haven't arrived yet; the filepaths are predictable), so I know
> where the chunks will be and what order to assemble them in.
>
> My first thought is to get the sorted list of chunk numbers for all the
> chunks that I currently have, locate contiguous sub-sequences, and assemble
> those sub sequences, then assemble the sub-sequences once they become
> contiguous.  Something like this:
>
> Available chunk nums:  '(1 2 3 5 7 200 201 202 203)
>
> Group them by contiguous:  '( (1 2 3) (5) (7) (200 201 202 203))
>
> Generate superchunks:  1-3, 200-203
>
> Later...
>
> Available chunk nums:  '(1-3 4 5 7 190 193 200-203)
>
> Group them by contiguous:  '( (1-3 4 5) (7) (190) (193) (200-203))
>
> Merge 4 and 5 into 1-3, rename 1-3 as 1-5.
>
> ...etc
>
> I realized I don't actually know a simple way to identify contiguous
> sub-sequences.  I came up with the following, but it feels clumsy.
>
> (define-values (n final)
>   (for/fold ((prev (car nums))
>              (acc '())
>              )
>             ((n (cdr nums)))
>     (values n
>             (if (= n (add1 prev))
>                (cons n acc)
>                (begin
>                   (set! result (cons (reverse acc) result))
>                   (list n))))
>     ))
> (cons (reverse final) result)
>
> Given this:  '(1 2 3 5 7 200 201 202 203)
> It yields this: '((200 201 202 203) (7) (5) (2 3))
>
> Which is fine -- I don't care about the order of the sublists within the
> overall list, as long as each sublist is itself sorted ascending.
>
> Does anyone have any advice to offer, on either the tactical or strategic
> problem?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to