On 28/09/2011 08:35, Jean-Francois Dockes wrote:
Denis Prost writes:
> Attached are 4 log files :
> * one from "recoll -t -q gazette" (155 results)
> * one from recollrunner with the same query (only "default query
> language" checked in recollrunner config) (3 results : only the
> ones among the 155 which do not contain spaces in their pathes)
> * one from recoll -t -f -q gazette" (46 results)
> * one from recollrunner with the same query ("default query language
> checked" and "match filenames" checked in recollrunner config) (0
> result)
>
> I hope it will help solving this issue.
> Regards
> Denis
Thanks a lot for the log files, my comments below:
first:
> :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: pattern: [*gazette*]
My guess is that this is from the 3d query (recoll -t -f -q gazette). The
"-q" which would specify a "query language" query is ignored (because of how
the options are parsed), and this is a filename query where gazette is
transformed to *gazette* because it is neither capitalized nor contains
wildcards. It is supposed to return all documents with [gazette] as part of
their file name.
Second:
> :4:../rcldb/searchdata.cpp:782:StringToXapianQ:: query string: [gazette]
This is from [recoll -t -q gazette], which is a regular text search query,
returning all documents with gazette or a derivative ([gazettes]) in the
contents, or possibly in the file name field processed as text.
Third:
> :4:../rcldb/searchdata.cpp:782:StringToXapianQ:: query string: ['gazette']
This is probably from recollrunner with only 'default query language'
checked: there is excessive quoting, but it doesn't hurt much because this
is a full text search and the quotes get eliminated. I don't know why
recollrunner returns few results, but as you mention that these are only
the ones without spaces in the file name, I'd suspect a problem parsing the
output from recoll.
Fourth:
> :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: pattern: [*'gazette'*]
This is with recollrunner, "match filenames" and "default query language"
checked. "Match filename" takes precedence and the query fails because of the
excessive quoting.
The only thing that I find strange in the logs is that the 3rd one seems to
indicate that the query actually returns more results than the 1st one,
when I would have thought that they are identical. But the quoting may have
affected the query, the actual Xapian query is truncated in the log for
some reason, so we can't be sure:
:4:../rcldb/rclquery.cpp:237:Query::SetQuery: Q: ((gazette:(wqf=11) OR gazettes
OR gazet:4:../rcldb/rclquery.cpp:344:Fetching for first 50, count 50
So I think that the first fixes should be for recollrunner to:
- Avoid excessive single quote quoting
- Indicate somehow that "query language" and "file name search" are
different and exclusive modes.
- Try to better parse the query output when there are spaces in the file
names.
And then we may get into possible Recoll issues. I'd be quite interested
though by the logs from the 2 following commands:
recoll -t -q gazette
recoll -t -q "'gazette'"
Here are the two logs :
* recoll -t -q gazette.log (same as already sent)
* recoll -t -q "gazette".log
Regards,
Denis
:4:../common/rclinit.cpp:106:rclinit: idxflushmb=10, set XAPIAN_FLUSH_THRESHOLD to 10E6
:4:../rcldb/rcldb.cpp:593:Db::open: m_isopen 0 m_iswritable 0
:4:../rcldb/stoplist.cpp:52:StopList::StopList: file_to_string(/home/denis/.recoll/stoplist.txt) failed: open/stat: errno: 2 :
:4:../query/wasatorcl.cpp:139:wasaQueryToRcl: leaf clause []:[gazette] slack 0
:4:../rcldb/rclquery.cpp:174:Query::setQuery:
:4:../rcldb/searchdata.cpp:782:StringToXapianQ:: query string: [gazette]
:5:../rcldb/searchdata.cpp:803:strToXapianQ: phrase/word: [gazette]
:5:../rcldb/searchdata.cpp:835:strToXapianQ: termcount: 1
:4:../rcldb/stemdb.cpp:272:stemExpand:english: [gazette] stem-> [gazett]
:5:../rcldb/stemdb.cpp:278:stemExpand: /home/denis/.recoll/xapiandb/stem_english lastdocid: 71147
:5:../rcldb/stemdb.cpp:314:stemExpand:english: gazett -> [gazette] [gazettes] [gazett]
:4:../rcldb/rclquery.cpp:237:Query::SetQuery: Q: ((gazette:(wqf=11) OR gazettes OR gazett))
:4:../rcldb/rclquery.cpp:315:Query::getResCnt: 1 mS
:4:../rcldb/rclquery.cpp:344:Fetching for first 50, count 50
:4:../rcldb/rclquery.cpp:344:Fetching for first 100, count 50
:4:../rcldb/rclquery.cpp:344:Fetching for first 150, count 50
:4:../rcldb/rclquery.cpp:344:Fetching for first 153, count 50
:4:../rcldb/rclquery.cpp:355:enquire->get_mset: got empty result
:5:../rcldb/searchdata.cpp:394:SearchData::erase
:4:../rcldb/rcldb.cpp:572:Db::~Db: isopen 1 m_iswritable 0
:4:../rcldb/rcldb.cpp:687:Db::i_close(1): m_isopen 1 m_iswritable 0
:4:../common/rclinit.cpp:106:rclinit: idxflushmb=10, set XAPIAN_FLUSH_THRESHOLD to 10E6
:4:../rcldb/rcldb.cpp:593:Db::open: m_isopen 0 m_iswritable 0
:4:../rcldb/stoplist.cpp:52:StopList::StopList: file_to_string(/home/denis/.recoll/stoplist.txt) failed: open/stat: errno: 2 :
:4:../query/wasatorcl.cpp:139:wasaQueryToRcl: leaf clause []:[gazette] slack 0
:4:../rcldb/rclquery.cpp:174:Query::setQuery:
:4:../rcldb/searchdata.cpp:782:StringToXapianQ:: query string: [gazette]
:5:../rcldb/searchdata.cpp:803:strToXapianQ: phrase/word: [gazette]
:5:../rcldb/searchdata.cpp:835:strToXapianQ: termcount: 1
:4:../rcldb/stemdb.cpp:272:stemExpand:english: [gazette] stem-> [gazett]
:5:../rcldb/stemdb.cpp:278:stemExpand: /home/denis/.recoll/xapiandb/stem_english lastdocid: 71147
:5:../rcldb/stemdb.cpp:314:stemExpand:english: gazett -> [gazette] [gazettes] [gazett]
:4:../rcldb/rclquery.cpp:237:Query::SetQuery: Q: ((gazette:(wqf=11) OR gazettes OR gazett))
:4:../rcldb/rclquery.cpp:315:Query::getResCnt: 0 mS
:4:../rcldb/rclquery.cpp:344:Fetching for first 50, count 50
:4:../rcldb/rclquery.cpp:344:Fetching for first 100, count 50
:4:../rcldb/rclquery.cpp:344:Fetching for first 150, count 50
:4:../rcldb/rclquery.cpp:344:Fetching for first 153, count 50
:4:../rcldb/rclquery.cpp:355:enquire->get_mset: got empty result
:5:../rcldb/searchdata.cpp:394:SearchData::erase
:4:../rcldb/rcldb.cpp:572:Db::~Db: isopen 1 m_iswritable 0
:4:../rcldb/rcldb.cpp:687:Db::i_close(1): m_isopen 1 m_iswritable 0