Re: Results of Advanced Search: Missing Message

2013-02-26 Thread Tibor Simko
On Wed, 30 Jan 2013, Alexander Wagner wrote:
> Ah! :) I didn't know that one. Thanks for this pointer. Seems that I'm
> on the safe side if I add it as &ap=0 in case we need exact matches.

Yes, for the kind of migration checks ap=0 is to be used.

> I understood this intention. I just wanted to point out that stripping
> of "seemingly useless puctuations" could cause trouble in case of
> exact matching. Google shows this problem e.g. if I search for
> identifier like terms (specs, rep-no, stuff like that).

One could always refer back to exact phrase search and/or regexp search
in these cases, where the matching is `exacter' than what a typical
lambda user would prefer.

Best regards
--
Tibor Simko


Re: Results of Advanced Search: Missing Message

2013-02-26 Thread Tibor Simko
On Tue, 29 Jan 2013, Ludmila Marian wrote:
> I think advanced search was needed historically in order too be able
> to join together different types of searches: regex and exact phrase
> search or partial phrase and all-of-the-words, etc. Meanwhile, simple
> search has evolved and, as you are saying, there is nothing advanced
> search can do, that simple search can not reproduce.

Yes.  The main reason was the fact that Simple Search did not originally
support nested parentheses; it was doing left-to-right inclusion or
exclusion of search terms.  So one could not express all queries that
Advanced Search could in Simple Search.

> To overcome this, and also to keep happy the users that prefer
> advanced search, we proposed another search interface, called
> add-to-search (you might remember it from the Invenio workshop): the
> users will have the possibility of using the advanced search format to
> compose a more complicated query in the simple search

Live preview at .

Best regards
--
Tibor Simko


Re: Results of Advanced Search: Missing Message

2013-01-29 Thread Alexander Wagner

On 29.01.2013 16:32, Ludmila Marian wrote:

Hello Ludmila!

[...]

Similar use case: JuSER feeds our web pages. Usually, these are searches
like: (cid:"inst-ID" and typ:"doctypeID" and pub:"year") and require
"return exactly". However, at the beginning of the year most of these
searches will yield empty results for some time and if the search
algorithm throws in a "did you mean" and returns say the results from
institute A but those for institute B just "cause there is something"
(did you mean?) in A but not in B this will cause trouble.


Indeed, problems might arise due to nearest search terms, but the
suggestions are mostly for web-interface users, to guide them to a
better search. Any other outputs, except the html-based ones, will
return an empty list.
Actually, there are some ways you can instruct the search engine to do
only exact search, and no nearest searches. One way is to use the the
'ap' parameter - alternative pattern (you can test it's behaviour by
adding &ap=0 or &ap=1)


Ah! :) I didn't know that one. Thanks for this pointer. Seems that I'm
on the safe side if I add it as &ap=0 in case we need exact matches.


In any case, the patch that I proposed for the advanced search issue, is
not proposing nearest searches ('did you mean') nor is doing any query
manipulation in order to get some results.


I understood this intention. I just wanted to point out that stripping
of "seemingly useless puctuations" could cause trouble in case of exact
matching. Google shows this problem e.g. if I search for identifier like
terms (specs, rep-no, stuff like that).


--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




Re: Results of Advanced Search: Missing Message

2013-01-29 Thread Ludmila Marian

Hi Alexander!



On 01/29/2013 12:53 PM, Alexander Wagner wrote:

I have proposed
a patch for this, for maint-1.1: a message will be displayed saying that
the boolean operation resulted in no hits, and we will print each of the
individual queries with their number of results (so the user can choose
to go only for a part of his query) - similar with the response obtained
from simple search. The patch will be integrated shortly.


This sounds resonable.

However, please keep one point in mind: "did you mean ... I'll display
those results even if you didn't" like query handling is great for
"discovery like" searches. Ie. "I don't know exactly what I'm looking
for" (usually, this is triggered by some carbon based life form on
OSI-layer 8) so displaying some area that is more likely and produces
hits is helpful.

However, in many scenarios there is also a need for exact matches that
do NOT dispaly something else if the result is zero.

To give an example: during migration we matched our old datasets against
a number of external sources like pubmed, arXiv, inspire and so on.
Usually, we used the DOI as key for this input. However, for some DOIs
pubmed finds it helpful to return exactly one record that is entirely
unrelated due to a "did you mean" expansion. (As long as I got two or
more hits I disregared the match.) This gave me a bunch of wrong
associations though I had a precise input parameter. As precise as a DOI
can be.

Note also that if you strip of / .  from a DOI like entity you
might produce a dupe that didn't exist with those in place. I stumbled
upon this cause we implemented a basic dupe detection on websubmit that
just searches the local database by the doi/pmid/arXiv whatever we got
as input. Usually, I used the string as such and did an "all fields" but
this triggered wrong results due to the stripping. (I re-coded this part
of the code search in field 0247_$a and 773__$a only and "" it. Now it
seems fine.)

Similar use case: JuSER feeds our web pages. Usually, these are searches
like: (cid:"inst-ID" and typ:"doctypeID" and pub:"year") and require
"return exactly". However, at the beginning of the year most of these
searches will yield empty results for some time and if the search
algorithm throws in a "did you mean" and returns say the results from
institute A but those for institute B just "cause there is something"
(did you mean?) in A but not in B this will cause trouble.


Indeed, problems might arise due to nearest search terms, but the 
suggestions are mostly for web-interface users, to guide them to a 
better search. Any other outputs, except the html-based ones, will 
return an empty list.
Actually, there are some ways you can instruct the search engine to do 
only exact search, and no nearest searches. One way is to use the the 
'ap' parameter - alternative pattern (you can test it's behaviour by 
adding &ap=0 or &ap=1)


ap - alternative patterns (0=no, 1=yes).  In case no exact
 match is found, the search engine can try alternative
 patterns e.g. to replace non-alphanumeric characters by
 a boolean query.  ap defines if this is wanted.

Another option, but this concerns only the display,  is to customise the 
CFG_WEBSEARCH_DISPLAY_NEAREST_TERMS to control if any suggestions will 
be displayed or not.


In any case, the patch that I proposed for the advanced search issue, is 
not proposing nearest searches ('did you mean') nor is doing any query 
manipulation in order to get some results. It behaves exactly as in this 
case:


(so just pointing out that there are individual results, but the boolean 
operation returned none: in this case the user will know for sure that 
his search patters are correct, it's just that there are no papers 
written by both müller and wert).



Cheers,
Ludmila





Regarding the advanced search being treated as simple search: this is a
very good question :-)

[...]

complicated query in the simple search - in this way they will also
'see' how the query is formed (what we use for regex search or exact
phrase search, etc.).


This will be great. If it is available soon I'll luckily wait for it. :)
If its a major task I'd suggest to rewrite the internals of adv search
(perform_request_search I think) to just rebase it to a simple search
with the same logic. I think this also solves Ferrans observations.


I am currently re-basing this branch and preparing it for integration to
Invenio (and also deployment to CDS, so you can see it 'in action' in a
few days.)


:) Seems like to be available shortly so we might win it if we move up
to 1.x (with x > 0). This upgrade will likely happen as soon as the
current evaluation period is done GSI, DESY and probably RWTH are up and
running as well. I dare not touch JuSER during this evaluation stuff.
This has to do with the simple fact that we go ~1100 websubmits to JuSER
in 2012 and another ~1200

Re: Results of Advanced Search: Missing Message

2013-01-29 Thread Alexander Wagner

On 29.01.2013 11:34, Ludmila Marian wrote:

Hello Ludmila!


The bug that you reported is triggered in advanced search, when each of
the individual queries returns results, but the boolean operation (in
this case the intersection) does not return any results.


Ah. That's the point.


I have proposed
a patch for this, for maint-1.1: a message will be displayed saying that
the boolean operation resulted in no hits, and we will print each of the
individual queries with their number of results (so the user can choose
to go only for a part of his query) - similar with the response obtained
from simple search. The patch will be integrated shortly.


This sounds resonable.

However, please keep one point in mind: "did you mean ... I'll display
those results even if you didn't" like query handling is great for
"discovery like" searches. Ie. "I don't know exactly what I'm looking
for" (usually, this is triggered by some carbon based life form on
OSI-layer 8) so displaying some area that is more likely and produces
hits is helpful.

However, in many scenarios there is also a need for exact matches that
do NOT dispaly something else if the result is zero.

To give an example: during migration we matched our old datasets against
a number of external sources like pubmed, arXiv, inspire and so on.
Usually, we used the DOI as key for this input. However, for some DOIs
pubmed finds it helpful to return exactly one record that is entirely
unrelated due to a "did you mean" expansion. (As long as I got two or
more hits I disregared the match.) This gave me a bunch of wrong
associations though I had a precise input parameter. As precise as a DOI
can be.

Note also that if you strip of / .  from a DOI like entity you
might produce a dupe that didn't exist with those in place. I stumbled
upon this cause we implemented a basic dupe detection on websubmit that
just searches the local database by the doi/pmid/arXiv whatever we got
as input. Usually, I used the string as such and did an "all fields" but
this triggered wrong results due to the stripping. (I re-coded this part
of the code search in field 0247_$a and 773__$a only and "" it. Now it
seems fine.)

Similar use case: JuSER feeds our web pages. Usually, these are searches
like: (cid:"inst-ID" and typ:"doctypeID" and pub:"year") and require
"return exactly". However, at the beginning of the year most of these
searches will yield empty results for some time and if the search
algorithm throws in a "did you mean" and returns say the results from
institute A but those for institute B just "cause there is something"
(did you mean?) in A but not in B this will cause trouble.


Regarding the advanced search being treated as simple search: this is a
very good question :-)

[...]

complicated query in the simple search - in this way they will also
'see' how the query is formed (what we use for regex search or exact
phrase search, etc.).


This will be great. If it is available soon I'll luckily wait for it. :)
If its a major task I'd suggest to rewrite the internals of adv search
(perform_request_search I think) to just rebase it to a simple search
with the same logic. I think this also solves Ferrans observations.


I am currently re-basing this branch and preparing it for integration to
Invenio (and also deployment to CDS, so you can see it 'in action' in a
few days.)


:) Seems like to be available shortly so we might win it if we move up
to 1.x (with x > 0). This upgrade will likely happen as soon as the
current evaluation period is done GSI, DESY and probably RWTH are up and
running as well. I dare not touch JuSER during this evaluation stuff.
This has to do with the simple fact that we go ~1100 websubmits to JuSER
in 2012 and another ~1200 in 2013 till now. @Samuele: that is the main
reason why I do not just apply a 1.1 update to fix the OAI server. If I
break something now I can set up my tent here on campus. Currently,
amongst others, a bit cold for this in Jülich ;)

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




Re: Results of Advanced Search: Missing Message

2013-01-29 Thread Ludmila Marian

Hi Alexander!

Happy new year!

The bug that you reported is triggered in advanced search, when each of 
the individual queries returns results, but the boolean operation (in 
this case the intersection) does not return any results. I have proposed 
a patch for this, for maint-1.1: a message will be displayed saying that 
the boolean operation resulted in no hits, and we will print each of the 
individual queries with their number of results (so the user can choose 
to go only for a part of his query) - similar with the response obtained 
from simple search. The patch will be integrated shortly.


Regarding the advanced search being treated as simple search: this is a 
very good question :-)


I think advanced search was needed historically in order too be able to 
join together different types of searches: regex and exact phrase search 
or partial phrase and all-of-the-words, etc. Meanwhile, simple search 
has evolved and, as you are saying, there is nothing advanced search can 
do, that simple search can not reproduce. Moreover, advanced search is 
limited compared with simple search. If you would like to compose a 
complicated search pattern with more then three types of searches, you 
could not do this with the advanced search interface.
To overcome this, and also to keep happy the users that prefer advanced 
search, we proposed another search interface, called add-to-search (you 
might remember it from the Invenio workshop): the users will have the 
possibility of using the advanced search format to compose a more 
complicated query in the simple search - in this way they will also 
'see' how the query is formed (what we use for regex search or exact 
phrase search, etc.).
I am currently re-basing this branch and preparing it for integration to 
Invenio (and also deployment to CDS, so you can see it 'in action' in a 
few days.)


Cheers,
Ludmila


On 01/08/2013 05:38 PM, Alexander Wagner wrote:

Hi!

Happy new year to all! :)

Unfortunately, I fear I have to start the year with a bug report. One of
our users just noted that if you use the advanced search and enter a
query without any results you do not get any notification in certain
circumstances. E.g. http://goo.gl/gsJ3p (searching müller and wert in
the author index of JuSER) triggers this behaviour. However, if I fill
in only one field it seems to report properly. I think this should be 
fixed.


At this occasion I wonder if not the advanced search should just create
a simple search with proper fields instead of using it's own code. As
far as I can see the above example (at least should ;) be identical to

   author:müller and author:wert

in simple search. And as far as I can see, there is, (taking /, " and '
modifiers into account) nothing that adv. search can perform that simple
search can't.

At this occasion I'd like to mention that I got the feedback from the
users that they actually prefer the advanced search in many cases
instead of just keying in the above term. Not that I understand it, it's
just users feedback. Even though we added a lot of just 3 keys logical
fields.

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


 

 


Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
 

 




--
Ludmila Marian ** CERN Document Server ** 



Re: Results of Advanced Search: Missing Message

2013-01-16 Thread Ferran Jorba
Hello Alexander,

[...]
> Unfortunately, I fear I have to start the year with a bug report. One
> of our users just noted that if you use the advanced search and enter
> a query without any results you do not get any notification in certain
> circumstances. E.g. http://goo.gl/gsJ3p (searching müller and wert in
> the author index of JuSER) triggers this behaviour. However, if I fill
> in only one field it seems to report properly. I think this should be
> fixed.

We have also been bitten by this bug on our test 1.1 installation: the
same arguments give different results on simple and advanced search.  I
searched my mail and I've seen that there is no answer.  Yet ;-)

Thanks,

Ferran


Results of Advanced Search: Missing Message

2013-01-08 Thread Alexander Wagner

Hi!

Happy new year to all! :)

Unfortunately, I fear I have to start the year with a bug report. One of
our users just noted that if you use the advanced search and enter a
query without any results you do not get any notification in certain
circumstances. E.g. http://goo.gl/gsJ3p (searching müller and wert in
the author index of JuSER) triggers this behaviour. However, if I fill
in only one field it seems to report properly. I think this should be fixed.

At this occasion I wonder if not the advanced search should just create
a simple search with proper fields instead of using it's own code. As
far as I can see the above example (at least should ;) be identical to

   author:müller and author:wert

in simple search. And as far as I can see, there is, (taking /, " and '
modifiers into account) nothing that adv. search can perform that simple
search can't.

At this occasion I'd like to mention that I got the feedback from the
users that they actually prefer the advanced search in many cases
instead of just keying in the above term. Not that I understand it, it's
just users feedback. Even though we added a lot of just 3 keys logical
fields.

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt