[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15809882#comment-15809882
 ] 

Mark Miller commented on SOLR-8292:
---

Correct. We know how much we are supposed to read, we don't need a signal and 
if we get an EOF exception it's because the file is corrupt.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-06 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806407#comment-15806407
 ] 

Cao Manh Dat commented on SOLR-8292:


I think people kinda of misunderstanding this line
{code}
* @return The log record, or null if EOF
{code} 
{{EOF}} here is not related to {{EOFException}}, {{EOF}} mean when the file is 
fully read to the end. While {{EOFException}} throw by TransactionLog.next() 
mean the file is corrupted. 

For example 
{code:title=TransactionLog.java|borderStyle=solid}
codec.writeTag(JavaBinCodec.ARR, 3);
codec.writeInt(UpdateLog.ADD | flags);  // should just take one byte
codec.writeLong(cmd.getVersion());
codec.writeSolrInputDocument(cmd.getSolrInputDocument());
{code}

So when {{LogReader}} read to the tag {{JavaBinCodec.ARR = 3}}, it will expect 
that there are 3 more elements to be read. But if the file have only 2 elements 
( because the file is corrupted ) it will throw an {{EOFException}}.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-06 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805015#comment-15805015
 ] 

Erick Erickson commented on SOLR-8292:
--

[~caomanhdat317] All I was really looking for was whether, in your opinion, 
this was even possible any more, I was just being lazy. This wasn't 
particularly about CDCR, it was just that CDCR exercised it I think.

Please don't spend time trying to reproduce. It sure would have been helpful if 
I'd recorded _what_ test failed a year ago wouldn't it? Shhh.

It's been a long time since I opened this. I'll just start monitoring CDCR 
Jenkins failures (I've noticed a few go by but mostly haven't pursued them) and 
see if anything similar reappears and if not, maybe we can close it. That'll 
take a while before anyone would feel comfortable. Probably should have been 
doing that all along. Ditto with SOLR-4116.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-05 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803900#comment-15803900
 ] 

Cao Manh Dat commented on SOLR-8292:


Hi Erick, I'm not familiar with CDCR code much. But I will give it a try today. 
Do we have any test that re procedure this error?

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-05 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803488#comment-15803488
 ] 

Erick Erickson commented on SOLR-8292:
--

 [~caomanhdat] You've also been in the tlog code significantly recently, do you 
have any opinion  on whether this (and SOLR-4116) are valid any longer?

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-05 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802475#comment-15802475
 ] 

Erick Erickson commented on SOLR-8292:
--

[~md...@cloudera.com][~rendel][~markrmil...@gmail.com] I originally assigned 
this one to myself to not lose track of it but haven't done anything else with 
it. Is it reasonable to close this? Perhaps SOLR-7478 or similar has fixed 
this? There's been a lot of hardening in the last year.

WDYT about closing SOLR-4116 too?

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2015-12-03 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037616#comment-15037616
 ] 

Renaud Delbru commented on SOLR-8292:
-

Perhaps related to SOLR-4116 ?

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2015-11-18 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010699#comment-15010699
 ] 

Renaud Delbru commented on SOLR-8292:
-

I have checked on the cdcr code side, and whenever a log reader is used, it is 
by a single thread only. So the problem might be laying somewhere else.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2015-11-16 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006519#comment-15006519
 ] 

Renaud Delbru commented on SOLR-8292:
-

I have reviewed all the methods writing records to the tlog file, and all of 
them are properly synchronised with the flushing of the output stream. 
However, the access to the input stream is not synchronised. Could it be that 
one concurrent thread changed the fis position while another one was trying to 
read a record ? The CdcrLogReader#resetToLastPosition could interfere with the 
TransactionLog.LogReader#next.


> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2015-11-16 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006537#comment-15006537
 ] 

Renaud Delbru commented on SOLR-8292:
-

The convention over the log reader is that it should not be used by more than 
one thread (see comment on TransactionLog#getReader). I'll double check if the 
cdcr code is respecting that.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2015-11-14 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005479#comment-15005479
 ] 

Erick Erickson commented on SOLR-8292:
--

Bah! fis is a FastInputStream, not FileInputStream. I'll try to add a bit of 
logging and trigger this again to see if the size comparison is even relevant.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org