Re: Commit Within and /update/extract handler

2014-04-09 Thread Jamie Johnson
This is being triggered by adding the commitWithin param to
ContentStreamUpdateRequest (request.setCommitWithin(1);).  My
configuration has autoCommit max time of 15s and openSearcher set to false.
 I'm assuming that changing openSeracher to true should address this, and
adding the softCommit = true to the request would make the documents
available in the mean time?

On Apr 8, 2014 10:02 AM, Erick Erickson erickerick...@gmail.com wrote:

 Got a clue how it's being generated? Because it's not going to show
 you documents.


 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 openSearcher=false and softCommit=false so the documents will be
 invisible. You need one or the other set to true.

 What it will do is close the current segment, open a new one and
 truncate the current transaction log. These may be good things but
 they have nothing to do with making docs visible :).

 See:

 http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 Best,
 Erick

 On Mon, Apr 7, 2014 at 8:43 PM, Jamie Johnson jej2...@gmail.com wrote:
  Below is the log showing what I believe to be the commit
 
  07-Apr-2014 23:40:55.846 INFO [catalina-exec-5]
  org.apache.solr.update.processor.LogUpdateProcessor.finish [forums]
  webapp=/solr path=/update/extract
 
 params={uprefix=attr_literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.content_group=File
  literal.id
 =e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.forum_id=3literal.content_type=application/octet-streamwt=javabinliteral.uploaded_by=+version=2literal.content_type=application/octet-streamliteral.file_name=exclusions}
  {add=[e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce (1464785652471037952)]} 0 563
  07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.DirectUpdateHandler2.commit start
 
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
  07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]: commit: start
  07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]: commit: enter lock
  07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]: commit: now prepare
  07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]: prepareCommit: flush
  07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]:   index before flush _y(4.6):C1
  _10(4.6):C1 _11(4.6):C1 _12(4.6):C1
  07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DW][commitScheduler-10-thread-1]: commitScheduler-10-thread-1
  startFullFlush
  07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DW][commitScheduler-10-thread-1]: anyChanges? numDocsInRam=1
 deletes=true
  hasTickets:false pendingChangesInFullFlush: false
  07-Apr-2014 23:41:10.850 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWFC][commitScheduler-10-thread-1]: addFlushableState
  DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_14,
  aborting=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 2 ]]
  07-Apr-2014 23:41:10.852 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: flush postings as segment _14
 numDocs=1
  07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: new segment has 0 deleted docs
  07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: new segment has no vectors; norms;
 no
  docValues; prox; freqs
  07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: flushedFiles=[_14.nvd,
  _14_Lucene41_0.pos, _14_Lucene41_0.tip, _14_Lucene41_0.tim, _14.nvm,
  _14.fdx, _14_Lucene41_0.doc, _14.fnm, _14.fdt]
  07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: flushed codec=Lucene46
  07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: flushed: segment=_14 ramUsed=0.122
 MB
  newFlushedSize(includes docstores)=0.003 MB docs/MB=322.937
  07-Apr-2014 

Re: Commit Within and /update/extract handler

2014-04-09 Thread Shawn Heisey
On 4/9/2014 7:47 AM, Jamie Johnson wrote:
 This is being triggered by adding the commitWithin param to
 ContentStreamUpdateRequest (request.setCommitWithin(1);).  My
 configuration has autoCommit max time of 15s and openSearcher set to false.
  I'm assuming that changing openSeracher to true should address this, and
 adding the softCommit = true to the request would make the documents
 available in the mean time?

My personal opinion: autoCommit should not be used for document
visibility, even though it CAN be used for it.  It belongs in every
config that uses the transaction log, with openSearcher set to false,
and carefully considered maxTime and/or maxDocs parameters.

I think it's better to control document visibility entirely manually,
but if you actually do want to have an automatic commit for document
visibility, use autoSoftCommit.  It doesn't make any sense to disable
openSearcher on a soft commit, so just leave that out.  The docs/time
intervals for this can be smaller or greater than the intervals for
autoCommit, depending on your needs.

Any manual commits that you send probably should be soft commits, but
honestly that doesn't really matter if your auto settings are correct.

Thanks,
Shawn



Re: Commit Within and /update/extract handler

2014-04-09 Thread Jamie Johnson
Thanks Shawn, I appreciate the information.


On Wed, Apr 9, 2014 at 10:27 AM, Shawn Heisey s...@elyograg.org wrote:

 On 4/9/2014 7:47 AM, Jamie Johnson wrote:
  This is being triggered by adding the commitWithin param to
  ContentStreamUpdateRequest (request.setCommitWithin(1);).  My
  configuration has autoCommit max time of 15s and openSearcher set to
 false.
   I'm assuming that changing openSeracher to true should address this, and
  adding the softCommit = true to the request would make the documents
  available in the mean time?

 My personal opinion: autoCommit should not be used for document
 visibility, even though it CAN be used for it.  It belongs in every
 config that uses the transaction log, with openSearcher set to false,
 and carefully considered maxTime and/or maxDocs parameters.

 I think it's better to control document visibility entirely manually,
 but if you actually do want to have an automatic commit for document
 visibility, use autoSoftCommit.  It doesn't make any sense to disable
 openSearcher on a soft commit, so just leave that out.  The docs/time
 intervals for this can be smaller or greater than the intervals for
 autoCommit, depending on your needs.

 Any manual commits that you send probably should be soft commits, but
 honestly that doesn't really matter if your auto settings are correct.

 Thanks,
 Shawn




Re: Commit Within and /update/extract handler

2014-04-08 Thread Erick Erickson
Got a clue how it's being generated? Because it's not going to show
you documents.

commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

openSearcher=false and softCommit=false so the documents will be
invisible. You need one or the other set to true.

What it will do is close the current segment, open a new one and
truncate the current transaction log. These may be good things but
they have nothing to do with making docs visible :).

See:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Mon, Apr 7, 2014 at 8:43 PM, Jamie Johnson jej2...@gmail.com wrote:
 Below is the log showing what I believe to be the commit

 07-Apr-2014 23:40:55.846 INFO [catalina-exec-5]
 org.apache.solr.update.processor.LogUpdateProcessor.finish [forums]
 webapp=/solr path=/update/extract
 params={uprefix=attr_literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.content_group=File
 literal.id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.forum_id=3literal.content_type=application/octet-streamwt=javabinliteral.uploaded_by=+version=2literal.content_type=application/octet-streamliteral.file_name=exclusions}
 {add=[e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce (1464785652471037952)]} 0 563
 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.DirectUpdateHandler2.commit start
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: commit: start
 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: commit: enter lock
 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: commit: now prepare
 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: prepareCommit: flush
 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]:   index before flush _y(4.6):C1
 _10(4.6):C1 _11(4.6):C1 _12(4.6):C1
 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DW][commitScheduler-10-thread-1]: commitScheduler-10-thread-1
 startFullFlush
 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DW][commitScheduler-10-thread-1]: anyChanges? numDocsInRam=1 deletes=true
 hasTickets:false pendingChangesInFullFlush: false
 07-Apr-2014 23:41:10.850 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWFC][commitScheduler-10-thread-1]: addFlushableState
 DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_14,
 aborting=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 2 ]]
 07-Apr-2014 23:41:10.852 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: flush postings as segment _14 numDocs=1
 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: new segment has 0 deleted docs
 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: new segment has no vectors; norms; no
 docValues; prox; freqs
 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: flushedFiles=[_14.nvd,
 _14_Lucene41_0.pos, _14_Lucene41_0.tip, _14_Lucene41_0.tim, _14.nvm,
 _14.fdx, _14_Lucene41_0.doc, _14.fnm, _14.fdt]
 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: flushed codec=Lucene46
 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: flushed: segment=_14 ramUsed=0.122 MB
 newFlushedSize(includes docstores)=0.003 MB docs/MB=322.937
 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DW][commitScheduler-10-thread-1]: publishFlushedSegment seg-private
 updates=null
 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: publishFlushedSegment
 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [BD][commitScheduler-10-thread-1]: push deletes  1 deleted terms (unique
 

Re: Commit Within and /update/extract handler

2014-04-07 Thread Erick Erickson
You say you see the commit happen in the log, is openSearcher
specified? This sounds like you're somehow getting a commit
with openSearcher=false...

Best,
Erick

On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson jej2...@gmail.com wrote:
 I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to
 work when I am using the /update/extract request handler.  It looks like a
 commit is happening from the logs, but the documents don't become available
 for search until I do a commit manually.  Could this be some type of
 configuration issue?


Re: Commit Within and /update/extract handler

2014-04-07 Thread Erick Erickson
What does the call look like? Are you setting opening a new searcher
or not? That should be in the log line where the commit is recorded...

FWIW,
Erick

On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson jej2...@gmail.com wrote:
 I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to
 work when I am using the /update/extract request handler.  It looks like a
 commit is happening from the logs, but the documents don't become available
 for search until I do a commit manually.  Could this be some type of
 configuration issue?


Re: Commit Within and /update/extract handler

2014-04-07 Thread Jamie Johnson
Below is the log showing what I believe to be the commit

07-Apr-2014 23:40:55.846 INFO [catalina-exec-5]
org.apache.solr.update.processor.LogUpdateProcessor.finish [forums]
webapp=/solr path=/update/extract
params={uprefix=attr_literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.content_group=File
literal.id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.forum_id=3literal.content_type=application/octet-streamwt=javabinliteral.uploaded_by=+version=2literal.content_type=application/octet-streamliteral.file_name=exclusions}
{add=[e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce (1464785652471037952)]} 0 563
07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.DirectUpdateHandler2.commit start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: commit: start
07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: commit: enter lock
07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: commit: now prepare
07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: prepareCommit: flush
07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]:   index before flush _y(4.6):C1
_10(4.6):C1 _11(4.6):C1 _12(4.6):C1
07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DW][commitScheduler-10-thread-1]: commitScheduler-10-thread-1
startFullFlush
07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DW][commitScheduler-10-thread-1]: anyChanges? numDocsInRam=1 deletes=true
hasTickets:false pendingChangesInFullFlush: false
07-Apr-2014 23:41:10.850 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWFC][commitScheduler-10-thread-1]: addFlushableState
DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_14,
aborting=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 2 ]]
07-Apr-2014 23:41:10.852 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: flush postings as segment _14 numDocs=1
07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: new segment has 0 deleted docs
07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: new segment has no vectors; norms; no
docValues; prox; freqs
07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: flushedFiles=[_14.nvd,
_14_Lucene41_0.pos, _14_Lucene41_0.tip, _14_Lucene41_0.tim, _14.nvm,
_14.fdx, _14_Lucene41_0.doc, _14.fnm, _14.fdt]
07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: flushed codec=Lucene46
07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: flushed: segment=_14 ramUsed=0.122 MB
newFlushedSize(includes docstores)=0.003 MB docs/MB=322.937
07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DW][commitScheduler-10-thread-1]: publishFlushedSegment seg-private
updates=null
07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: publishFlushedSegment
07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[BD][commitScheduler-10-thread-1]: push deletes  1 deleted terms (unique
count=1) bytesUsed=1024 delGen=4 packetCount=1 totBytesUsed=1024
07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: publish sets newSegment delGen=5
seg=_14(4.6):C1
07-Apr-2014 23:41:10.908 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IFD][commitScheduler-10-thread-1]: now checkpoint _y(4.6):C1 _10(4.6):C1
_11(4.6):C1 _12(4.6):C1 _14(4.6):C1 [5 segments ; isCommit = false]
07-Apr-2014 23:41:10.908 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IFD][commitScheduler-10-thread-1]: 0 msec to checkpoint
07-Apr-2014 23:41:10.908 INFO [commitScheduler-10-thread-1]

Commit Within and /update/extract handler

2014-04-06 Thread Jamie Johnson
I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to
work when I am using the /update/extract request handler.  It looks like a
commit is happening from the logs, but the documents don't become available
for search until I do a commit manually.  Could this be some type of
configuration issue?