RE: [commons-io] TeeInputStream that ignores skip/reset?

2015-12-17 Thread Allison, Timothy B.
Right, that's the use case.  In Tika, we have no control over what our 
dependencies are doing to the stream.  

The current implementation does a mark/reset for digesting then parsing... up 
to a certain limit, after which we cache to disk and then digest then parse the 
tmp file separately.  

The downside to this (TIKA-1701) is that for truncated zip/package files, the 
digester reads to the end of the stream for an embedded file and hits the zip 
exception and then the parser fails to extract the contents of as many files as 
it would have if it had just been parsing the file without the digester.

If skip/reset don't make any sense for a DigestingInputStream generally, I'll 
keep our modified TeeInputStream over in Tika land.

If there are other recommendations for handling this, let me know.

Thank you!

Best,

  Tim

-Original Message-
From: sebb [mailto:seb...@gmail.com] 
Sent: Wednesday, December 16, 2015 1:07 PM
To: Commons Users List <user@commons.apache.org>
Subject: Re: [commons-io] TeeInputStream that ignores skip/reset?

I'm not sure what the use case for this is, apart from avoiding the bug in 
DigestingInputStream.
Which can be avoided by not using skip/reset.

I'm not sure that skip/reset make any sense for a DigestingInputStream anyway.


On 16 December 2015 at 12:19, Allison, Timothy B. <talli...@mitre.org> wrote:
> All,
>   Over on Tika, we'd like a DigestingInputStream that ignores skip/reset 
> (unlike Java's v <= 1.8 [0]).  Before we reinvent the wheel, is there an 
> InputStream similar to TeeInputStream that ignores skip/reset, so that the 
> Digester would only see the stream as if it were read sequentially without 
> skip/reset?
>   If we do reinvent the wheel, should we contribute this InputStream to 
> commons-io as an alternate to TeeInputStream?
>   Or, even more generally, are there other recommendations for handling this? 
>  Thank you!
>
>  Best,
>
>  Tim
>
> [0] 
> http://mail-archives.apache.org/mod_mbox/commons-user/201508.mbox/%3CD
> M2PR09MB07135F86C7AC6981F1BB216BC78A0%40DM2PR09MB0713.namprd09.prod.ou
> tlook.com%3E

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [commons-io] TeeInputStream that ignores skip/reset?

2015-12-16 Thread sebb
I'm not sure what the use case for this is, apart from avoiding the
bug in DigestingInputStream.
Which can be avoided by not using skip/reset.

I'm not sure that skip/reset make any sense for a DigestingInputStream anyway.


On 16 December 2015 at 12:19, Allison, Timothy B.  wrote:
> All,
>   Over on Tika, we'd like a DigestingInputStream that ignores skip/reset 
> (unlike Java's v <= 1.8 [0]).  Before we reinvent the wheel, is there an 
> InputStream similar to TeeInputStream that ignores skip/reset, so that the 
> Digester would only see the stream as if it were read sequentially without 
> skip/reset?
>   If we do reinvent the wheel, should we contribute this InputStream to 
> commons-io as an alternate to TeeInputStream?
>   Or, even more generally, are there other recommendations for handling this? 
>  Thank you!
>
>  Best,
>
>  Tim
>
> [0] 
> http://mail-archives.apache.org/mod_mbox/commons-user/201508.mbox/%3CDM2PR09MB07135F86C7AC6981F1BB216BC78A0%40DM2PR09MB0713.namprd09.prod.outlook.com%3E

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org