Modelling Access Control

2010-10-23 Thread Paul Carey
Hi

My domain model is made of users that have access to projects which
are composed of items. I'm hoping to use Solr and would like to make
sure that searches only return results for items that users have
access to.

I've looked over some of the older posts on this mailing list about
access control and saw a suggestion along the lines of
acl:user_id AND (actual query).

While this obviously works, there are a couple of niggles. Every item
must have a list of valid user ids (typically less than 100 in my
case). Every time a collaborator is added to or removed from a
project, I need to update every item in that project. This will
typically be fewer than 1000 items, so I guess is no big deal.

I wondered if the following might be a reasonable alternative,
assuming the number of projects to which a user has access is lower
than a certain bound.
(acl:project_id OR acl:project_id OR ... ) AND (actual query)

When the numbers are small - e.g. each user has access to ~20 projects
and each project has ~20 collaborators - is one approach preferable
over another? And when outliers exist - e.g. a project with 2000
collaborators, or a user with access to 2000 projects - is one
approach more liable to fail than the other?

Many thanks

Paul


Re: A bug in ComplexPhraseQuery ?

2010-10-23 Thread jmr


iorixxx wrote:
 
 queryParser name=complexphrase
 class=org.apache.solr.search.ComplexPhraseQParserPlugin
     bool
 name=inOrderfalse/bool
   /queryParser
 
 
 I added this change to SOLR-1604, can you test it give us feedback?
 
 

May thanks. I'll test this quite soon and let you know.
J-Michel
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/A-bug-in-ComplexPhraseQuery-tp1744659p1757145.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: xpath processing

2010-10-23 Thread Ben Boggess
 processor=FileListEntityProcessor fileName=.*xml recursive=true 

Shouldn't this be fileName=*.xml?

Ben

On Oct 22, 2010, at 10:52 PM, pghorp...@ucla.edu wrote:

 
 
 dataConfig
 dataSource name=myfilereader type=FileDataSource/
 document
 entity name=f rootEntity=false dataSource=null 
 processor=FileListEntityProcessor fileName=.*xml recursive=true 
 baseDir=C:\data\sample_records\mods\starr
 entity name=x dataSource=myfilereader processor=XPathEntityProcessor 
 url=${f.fileAbsolutePath} stream=false forEach=/mods 
 transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer
 field column=id template=${f.file}/
 field column=collectionKey template=starr/
 field column=collectionName template=starr/
 field column=fileAbsolutePath template=${f.fileAbsolutePath}/
 field column=fileName template=${f.file}/
 field column=fileSize template=${f.fileSize}/
 field column=fileLastModified template=${f.fileLastModified}/
 field column=classification_keyword xpath=/mods/classification/
 field column=accessCondition_keyword xpath=/mods/accessCondition/
 field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date']
 /
 /entity
 /entity
 /document
 /dataConfig
 
 Quoting Ken Stanley doh...@gmail.com:
 
 Parinita,
 
 In its simplest form, what does your entity definition for DIH look like;
 also, what does one record from your xml look like? We need more information
 before we can really be of any help. :)
 
 - Ken
 
 It looked like something resembling white marble, which was
 probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy
 
 
 On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote:
 
 Quoting pghorp...@ucla.edu:
 Can someone help me please?
 
 
 I am trying to import mods xml data in solr using  the xml/http datasource
 
 This does not work with XPathEntityProcessor of the data import handler
 xpath=/mods/name/namepa...@type = 'date']
 
 I actually have 143 records with type attribute as 'date' for element
 namePart.
 
 Thank you
 Parinita
 
 
 
 
 
 
 


Re: Spatial

2010-10-23 Thread Grant Ingersoll

On Oct 20, 2010, at 12:14 PM, Pradeep Singh wrote:

 Thanks for your response Grant.
 
 I already have the bounding box based implementation in place. And on a
 document base of around 350K it is super fast.
 
 What about a document base of millions of documents? While a tier based
 approach will narrow down the document space significantly this concern
 might be misplaced because there are other numeric range queries I am going
 to run anyway which don't have anything to do with spatial query. But the
 keyword here is numeric range query based on NumericField, which is going to
 be significantly faster than regular number based queries. I see that the
 dynamic field type _latLon is of type double and not tdouble by default. Can
 I have your input about that decision?

It's just an example.  There shouldn't be any problem with using tdouble (or 
tfloat if you don't need the precision)


 
 -Pradeep
 
 On Tue, Oct 19, 2010 at 6:10 PM, Grant Ingersoll gsing...@apache.orgwrote:
 
 
 On Oct 19, 2010, at 6:23 PM, Pradeep Singh wrote:
 
 https://issues.apache.org/jira/browse/LUCENE-2519
 
 If I change my code as per 2519
 
 to have this  -
 
 public double[] coords(double latitude, double longitude) {
   double rlat = Math.toRadians(latitude);
   double rlong = Math.toRadians(longitude);
   double nlat = rlong * Math.cos(rlat);
   return new double[]{nlat, rlong};
 
 }
 
 
 return this -
 
 x = (gamma - gamma[0]) cos(phi)
 y = phi
 
 would it make it give correct results? Correct projections, tier ids?
 
 I'm not sure.  I have a lot of doubt around that code.  After making that
 correction, I spent several days trying to get the tests to pass and
 ultimately gave up.  Does that mean it is wrong?  I don't know.  I just
 don't have enough confidence to recommend it given that the tests I were
 asking it to do I could verify through other tools.  Personally, I would
 recommend seeing if one of the non-tier based approaches suffices for your
 situation and use that.
 
 -Grant

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search



Re: Import From MYSQL database

2010-10-23 Thread do3do3

what i know is to define you field in schema.xml file and build
database_conf.xml file which contain identification for your database 
finally you should define dataimporthandler in solrconfig.xml file
i put sample from what you should done in first post in this topic you can
check it,
if i know additional information i will tell you
good luck
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1756744.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Import From MYSQL database

2010-10-23 Thread do3do3

i found this files but i can't found any useful info. inside it, what i found
is GET command in http request
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1756778.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index long words with StandardTokenizerFactory?

2010-10-23 Thread Sergey Bartunov
Here are all the files: http://rghost.net/3016862

1) StandardAnalyzer.java, StandardTokenizer.java - patched files from
lucene-2.9.3
2) I patch these files and build lucene by typing ant
3) I replace lucene-core-2.9.3.jar in solr/lib/ by my
lucene-core-2.9.3-dev.jar that I'd just compiled
4) than I do ant compile and ant dist in solr folder
5) after that I recompile solr/example/webapps/solr.war with my new
solr and lucene-core jars
6) I put my schema.xml in solr/example/solr/conf/
7) then I do java -jar start.jar in solr/example
8) index big_post.xml
9) trying to find this document by curl
http://localhost:8983/solr/select?q=body:big*; (big_post.xml contains
a long word biga...)
10) solr returns nothing

On 23 October 2010 02:43, Steven A Rowe sar...@syr.edu wrote:
 Hi Sergey,

 What does your ~34kb field value look like?  Does StandardTokenizer think 
 it's just one token?

 What doesn't work?  What happens?

 Steve

 -Original Message-
 From: Sergey Bartunov [mailto:sbos@gmail.com]
 Sent: Friday, October 22, 2010 3:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to index long words with StandardTokenizerFactory?

 I'm using Solr 1.4.1. Now I'm successed with replacing lucene-core jar
 but maxTokenValue seems to be used in very strange way. Currenty for
 me it's set to 1024*1024, but I couldn't index a field with just size
 of ~34kb. I understand that it's a little weird to index such a big
 data, but I just want to know it doesn't work

 On 22 October 2010 20:36, Steven A Rowe sar...@syr.edu wrote:
  Hi Sergey,
 
  I've opened an issue to add a maxTokenLength param to the
 StandardTokenizerFactory configuration:
 
         https://issues.apache.org/jira/browse/SOLR-2188
 
  I'll work on it this weekend.
 
  Are you using Solr 1.4.1?  I ask because of your mention of Lucene
 2.9.3.  I'm not sure there will ever be a Solr 1.4.2 release.  I plan on
 targeting Solr 3.1 and 4.0 for the SOLR-2188 fix.
 
  I'm not sure why you didn't get the results you wanted with your Lucene
 hack - is it possible you have other Lucene jars in your Solr classpath?
 
  Steve
 
  -Original Message-
  From: Sergey Bartunov [mailto:sbos@gmail.com]
  Sent: Friday, October 22, 2010 12:08 PM
  To: solr-user@lucene.apache.org
  Subject: How to index long words with StandardTokenizerFactory?
 
  I'm trying to force solr to index words which length is more than 255
  symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene
  StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag
  in schema configuration XML. Specifying the maxTokenLength attribute
  won't work.
 
  I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src
  and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar
  and replaced original lucene-core jar in solr /lib. But seems like
  that it had bring no effect.


Re: Solr Javascript+JSON not optimized for SEO

2010-10-23 Thread PeterKerk

Unfortunately its not online yet, but is there anything I can clarify in more
detail?

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Javascript-JSON-not-optimized-for-SEO-tp1751641p1758054.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index long words with StandardTokenizerFactory?

2010-10-23 Thread Ahmet Arslan
Did you delete the folder Jetty_0_0_0_0_8983_solr.war_** under 
apache-solr-1.4.1\example\work?

--- On Sat, 10/23/10, Sergey Bartunov sbos@gmail.com wrote:

 From: Sergey Bartunov sbos@gmail.com
 Subject: Re: How to index long words with StandardTokenizerFactory?
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 3:56 PM
 Here are all the files: http://rghost.net/3016862
 
 1) StandardAnalyzer.java, StandardTokenizer.java - patched
 files from
 lucene-2.9.3
 2) I patch these files and build lucene by typing ant
 3) I replace lucene-core-2.9.3.jar in solr/lib/ by my
 lucene-core-2.9.3-dev.jar that I'd just compiled
 4) than I do ant compile and ant dist in solr folder
 5) after that I recompile solr/example/webapps/solr.war
 with my new
 solr and lucene-core jars
 6) I put my schema.xml in solr/example/solr/conf/
 7) then I do java -jar start.jar in solr/example
 8) index big_post.xml
 9) trying to find this document by curl
 http://localhost:8983/solr/select?q=body:big*;
 (big_post.xml contains
 a long word biga...)
 10) solr returns nothing
 
 On 23 October 2010 02:43, Steven A Rowe sar...@syr.edu
 wrote:
  Hi Sergey,
 
  What does your ~34kb field value look like?  Does
 StandardTokenizer think it's just one token?
 
  What doesn't work?  What happens?
 
  Steve
 
  -Original Message-
  From: Sergey Bartunov [mailto:sbos@gmail.com]
  Sent: Friday, October 22, 2010 3:18 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How to index long words with
 StandardTokenizerFactory?
 
  I'm using Solr 1.4.1. Now I'm successed with
 replacing lucene-core jar
  but maxTokenValue seems to be used in very strange
 way. Currenty for
  me it's set to 1024*1024, but I couldn't index a
 field with just size
  of ~34kb. I understand that it's a little weird to
 index such a big
  data, but I just want to know it doesn't work
 
  On 22 October 2010 20:36, Steven A Rowe sar...@syr.edu
 wrote:
   Hi Sergey,
  
   I've opened an issue to add a maxTokenLength
 param to the
  StandardTokenizerFactory configuration:
  
          https://issues.apache.org/jira/browse/SOLR-2188
  
   I'll work on it this weekend.
  
   Are you using Solr 1.4.1?  I ask because of
 your mention of Lucene
  2.9.3.  I'm not sure there will ever be a Solr
 1.4.2 release.  I plan on
  targeting Solr 3.1 and 4.0 for the SOLR-2188 fix.
  
   I'm not sure why you didn't get the results
 you wanted with your Lucene
  hack - is it possible you have other Lucene jars
 in your Solr classpath?
  
   Steve
  
   -Original Message-
   From: Sergey Bartunov [mailto:sbos@gmail.com]
   Sent: Friday, October 22, 2010 12:08 PM
   To: solr-user@lucene.apache.org
   Subject: How to index long words with
 StandardTokenizerFactory?
  
   I'm trying to force solr to index words
 which length is more than 255
   symbols (this constant is
 DEFAULT_MAX_TOKEN_LENGTH in lucene
   StandardAnalyzer.java) using
 StandardTokenizerFactory as 'filter' tag
   in schema configuration XML. Specifying
 the maxTokenLength attribute
   won't work.
  
   I'd tried to make the dirty hack: I
 downloaded lucene-core-2.9.3 src
   and changed the DEFAULT_MAX_TOKEN_LENGTH
 to 100, built it to jar
   and replaced original lucene-core jar in
 solr /lib. But seems like
   that it had bring no effect.
 





Re: Modelling Access Control

2010-10-23 Thread Israel Ekpo
Hi Paul,

Regardless of how you implement it, I would recommend you use filter queries
for the permissions check rather than making it part of the main query.

On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com wrote:

 Hi

 My domain model is made of users that have access to projects which
 are composed of items. I'm hoping to use Solr and would like to make
 sure that searches only return results for items that users have
 access to.

 I've looked over some of the older posts on this mailing list about
 access control and saw a suggestion along the lines of
 acl:user_id AND (actual query).

 While this obviously works, there are a couple of niggles. Every item
 must have a list of valid user ids (typically less than 100 in my
 case). Every time a collaborator is added to or removed from a
 project, I need to update every item in that project. This will
 typically be fewer than 1000 items, so I guess is no big deal.

 I wondered if the following might be a reasonable alternative,
 assuming the number of projects to which a user has access is lower
 than a certain bound.
 (acl:project_id OR acl:project_id OR ... ) AND (actual query)

 When the numbers are small - e.g. each user has access to ~20 projects
 and each project has ~20 collaborators - is one approach preferable
 over another? And when outliers exist - e.g. a project with 2000
 collaborators, or a user with access to 2000 projects - is one
 approach more liable to fail than the other?

 Many thanks

 Paul




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: How to index long words with StandardTokenizerFactory?

2010-10-23 Thread Sergey Bartunov
Yes. I did. Won't help.

On 23 October 2010 17:45, Ahmet Arslan iori...@yahoo.com wrote:
 Did you delete the folder Jetty_0_0_0_0_8983_solr.war_** under 
 apache-solr-1.4.1\example\work?

 --- On Sat, 10/23/10, Sergey Bartunov sbos@gmail.com wrote:

 From: Sergey Bartunov sbos@gmail.com
 Subject: Re: How to index long words with StandardTokenizerFactory?
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 3:56 PM
 Here are all the files: http://rghost.net/3016862

 1) StandardAnalyzer.java, StandardTokenizer.java - patched
 files from
 lucene-2.9.3
 2) I patch these files and build lucene by typing ant
 3) I replace lucene-core-2.9.3.jar in solr/lib/ by my
 lucene-core-2.9.3-dev.jar that I'd just compiled
 4) than I do ant compile and ant dist in solr folder
 5) after that I recompile solr/example/webapps/solr.war
 with my new
 solr and lucene-core jars
 6) I put my schema.xml in solr/example/solr/conf/
 7) then I do java -jar start.jar in solr/example
 8) index big_post.xml
 9) trying to find this document by curl
 http://localhost:8983/solr/select?q=body:big*;
 (big_post.xml contains
 a long word biga...)
 10) solr returns nothing

 On 23 October 2010 02:43, Steven A Rowe sar...@syr.edu
 wrote:
  Hi Sergey,
 
  What does your ~34kb field value look like?  Does
 StandardTokenizer think it's just one token?
 
  What doesn't work?  What happens?
 
  Steve
 
  -Original Message-
  From: Sergey Bartunov [mailto:sbos@gmail.com]
  Sent: Friday, October 22, 2010 3:18 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How to index long words with
 StandardTokenizerFactory?
 
  I'm using Solr 1.4.1. Now I'm successed with
 replacing lucene-core jar
  but maxTokenValue seems to be used in very strange
 way. Currenty for
  me it's set to 1024*1024, but I couldn't index a
 field with just size
  of ~34kb. I understand that it's a little weird to
 index such a big
  data, but I just want to know it doesn't work
 
  On 22 October 2010 20:36, Steven A Rowe sar...@syr.edu
 wrote:
   Hi Sergey,
  
   I've opened an issue to add a maxTokenLength
 param to the
  StandardTokenizerFactory configuration:
  
          https://issues.apache.org/jira/browse/SOLR-2188
  
   I'll work on it this weekend.
  
   Are you using Solr 1.4.1?  I ask because of
 your mention of Lucene
  2.9.3.  I'm not sure there will ever be a Solr
 1.4.2 release.  I plan on
  targeting Solr 3.1 and 4.0 for the SOLR-2188 fix.
  
   I'm not sure why you didn't get the results
 you wanted with your Lucene
  hack - is it possible you have other Lucene jars
 in your Solr classpath?
  
   Steve
  
   -Original Message-
   From: Sergey Bartunov [mailto:sbos@gmail.com]
   Sent: Friday, October 22, 2010 12:08 PM
   To: solr-user@lucene.apache.org
   Subject: How to index long words with
 StandardTokenizerFactory?
  
   I'm trying to force solr to index words
 which length is more than 255
   symbols (this constant is
 DEFAULT_MAX_TOKEN_LENGTH in lucene
   StandardAnalyzer.java) using
 StandardTokenizerFactory as 'filter' tag
   in schema configuration XML. Specifying
 the maxTokenLength attribute
   won't work.
  
   I'd tried to make the dirty hack: I
 downloaded lucene-core-2.9.3 src
   and changed the DEFAULT_MAX_TOKEN_LENGTH
 to 100, built it to jar
   and replaced original lucene-core jar in
 solr /lib. But seems like
   that it had bring no effect.







Re: How to index long words with StandardTokenizerFactory?

2010-10-23 Thread Ahmet Arslan
I think you should replace your new lucene-core-2.9.3-dev.jar in 
\apache-solr-1.4.1\lib and then create a new solr.war under 
\apache-solr-1.4.1\dist. And copy this new solr.war to 
solr/example/webapps/solr.war

--- On Sat, 10/23/10, Sergey Bartunov sbos@gmail.com wrote:

 From: Sergey Bartunov sbos@gmail.com
 Subject: Re: How to index long words with StandardTokenizerFactory?
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 5:45 PM
 Yes. I did. Won't help.
 
 On 23 October 2010 17:45, Ahmet Arslan iori...@yahoo.com
 wrote:
  Did you delete the folder
 Jetty_0_0_0_0_8983_solr.war_** under
 apache-solr-1.4.1\example\work?
 
  --- On Sat, 10/23/10, Sergey Bartunov sbos@gmail.com
 wrote:
 
  From: Sergey Bartunov sbos@gmail.com
  Subject: Re: How to index long words with
 StandardTokenizerFactory?
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 3:56 PM
  Here are all the files: http://rghost.net/3016862
 
  1) StandardAnalyzer.java, StandardTokenizer.java -
 patched
  files from
  lucene-2.9.3
  2) I patch these files and build lucene by typing
 ant
  3) I replace lucene-core-2.9.3.jar in solr/lib/ by
 my
  lucene-core-2.9.3-dev.jar that I'd just compiled
  4) than I do ant compile and ant dist in solr
 folder
  5) after that I recompile
 solr/example/webapps/solr.war
  with my new
  solr and lucene-core jars
  6) I put my schema.xml in solr/example/solr/conf/
  7) then I do java -jar start.jar in
 solr/example
  8) index big_post.xml
  9) trying to find this document by curl
  http://localhost:8983/solr/select?q=body:big*;
  (big_post.xml contains
  a long word biga...)
  10) solr returns nothing
 
  On 23 October 2010 02:43, Steven A Rowe sar...@syr.edu
  wrote:
   Hi Sergey,
  
   What does your ~34kb field value look like?
  Does
  StandardTokenizer think it's just one token?
  
   What doesn't work?  What happens?
  
   Steve
  
   -Original Message-
   From: Sergey Bartunov [mailto:sbos@gmail.com]
   Sent: Friday, October 22, 2010 3:18 PM
   To: solr-user@lucene.apache.org
   Subject: Re: How to index long words
 with
  StandardTokenizerFactory?
  
   I'm using Solr 1.4.1. Now I'm successed
 with
  replacing lucene-core jar
   but maxTokenValue seems to be used in
 very strange
  way. Currenty for
   me it's set to 1024*1024, but I couldn't
 index a
  field with just size
   of ~34kb. I understand that it's a little
 weird to
  index such a big
   data, but I just want to know it doesn't
 work
  
   On 22 October 2010 20:36, Steven A Rowe
 sar...@syr.edu
  wrote:
Hi Sergey,
   
I've opened an issue to add a
 maxTokenLength
  param to the
   StandardTokenizerFactory configuration:
   
       https://issues.apache.org/jira/browse/SOLR-2188
   
I'll work on it this weekend.
   
Are you using Solr 1.4.1?  I ask
 because of
  your mention of Lucene
   2.9.3.  I'm not sure there will ever be
 a Solr
  1.4.2 release.  I plan on
   targeting Solr 3.1 and 4.0 for the
 SOLR-2188 fix.
   
I'm not sure why you didn't get the
 results
  you wanted with your Lucene
   hack - is it possible you have other
 Lucene jars
  in your Solr classpath?
   
Steve
   
-Original Message-
From: Sergey Bartunov [mailto:sbos@gmail.com]
Sent: Friday, October 22, 2010
 12:08 PM
To: solr-user@lucene.apache.org
Subject: How to index long words
 with
  StandardTokenizerFactory?
   
I'm trying to force solr to
 index words
  which length is more than 255
symbols (this constant is
  DEFAULT_MAX_TOKEN_LENGTH in lucene
StandardAnalyzer.java) using
  StandardTokenizerFactory as 'filter' tag
in schema configuration XML.
 Specifying
  the maxTokenLength attribute
won't work.
   
I'd tried to make the dirty
 hack: I
  downloaded lucene-core-2.9.3 src
and changed the
 DEFAULT_MAX_TOKEN_LENGTH
  to 100, built it to jar
and replaced original
 lucene-core jar in
  solr /lib. But seems like
that it had bring no effect.
 
 
 
 
 
 





Re: How to index long words with StandardTokenizerFactory?

2010-10-23 Thread Yonik Seeley
On Fri, Oct 22, 2010 at 12:07 PM, Sergey Bartunov sbos@gmail.com wrote:
 I'm trying to force solr to index words which length is more than 255

If the field is not a text field, the Solr's default analyzer is used,
which currently limits the token to 256 bytes.
Out of curiosity, what's your usecase that you really need a single 34KB token?

-Yonik
http://www.lucidimagination.com


Re: How to index long words with StandardTokenizerFactory?

2010-10-23 Thread Sergey Bartunov
Look at the scheme.xml that I provided. I use my own text_block type
which is derived from TextField. And I force using
StandardTokenizerFactory using tokenizer tag.

If I use StrField type there are no problems with big data indexing.
The problem is in the tokenizer.

On 23 October 2010 18:55, Yonik Seeley yo...@lucidimagination.com wrote:
 On Fri, Oct 22, 2010 at 12:07 PM, Sergey Bartunov sbos@gmail.com wrote:
 I'm trying to force solr to index words which length is more than 255

 If the field is not a text field, the Solr's default analyzer is used,
 which currently limits the token to 256 bytes.
 Out of curiosity, what's your usecase that you really need a single 34KB 
 token?

 -Yonik
 http://www.lucidimagination.com



Re: How to index long words with StandardTokenizerFactory?

2010-10-23 Thread Sergey Bartunov
This is exactly what I did. Look:

  3) I replace lucene-core-2.9.3.jar in solr/lib/ by
 my
  lucene-core-2.9.3-dev.jar that I'd just compiled
  4) than I do ant compile and ant dist in solr
 folder
  5) after that I recompile
 solr/example/webapps/solr.war

On 23 October 2010 18:53, Ahmet Arslan iori...@yahoo.com wrote:
 I think you should replace your new lucene-core-2.9.3-dev.jar in 
 \apache-solr-1.4.1\lib and then create a new solr.war under 
 \apache-solr-1.4.1\dist. And copy this new solr.war to 
 solr/example/webapps/solr.war

 --- On Sat, 10/23/10, Sergey Bartunov sbos@gmail.com wrote:

 From: Sergey Bartunov sbos@gmail.com
 Subject: Re: How to index long words with StandardTokenizerFactory?
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 5:45 PM
 Yes. I did. Won't help.

 On 23 October 2010 17:45, Ahmet Arslan iori...@yahoo.com
 wrote:
  Did you delete the folder
 Jetty_0_0_0_0_8983_solr.war_** under
 apache-solr-1.4.1\example\work?
 
  --- On Sat, 10/23/10, Sergey Bartunov sbos@gmail.com
 wrote:
 
  From: Sergey Bartunov sbos@gmail.com
  Subject: Re: How to index long words with
 StandardTokenizerFactory?
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 3:56 PM
  Here are all the files: http://rghost.net/3016862
 
  1) StandardAnalyzer.java, StandardTokenizer.java -
 patched
  files from
  lucene-2.9.3
  2) I patch these files and build lucene by typing
 ant
  3) I replace lucene-core-2.9.3.jar in solr/lib/ by
 my
  lucene-core-2.9.3-dev.jar that I'd just compiled
  4) than I do ant compile and ant dist in solr
 folder
  5) after that I recompile
 solr/example/webapps/solr.war
  with my new
  solr and lucene-core jars
  6) I put my schema.xml in solr/example/solr/conf/
  7) then I do java -jar start.jar in
 solr/example
  8) index big_post.xml
  9) trying to find this document by curl
  http://localhost:8983/solr/select?q=body:big*;
  (big_post.xml contains
  a long word biga...)
  10) solr returns nothing
 
  On 23 October 2010 02:43, Steven A Rowe sar...@syr.edu
  wrote:
   Hi Sergey,
  
   What does your ~34kb field value look like?
  Does
  StandardTokenizer think it's just one token?
  
   What doesn't work?  What happens?
  
   Steve
  
   -Original Message-
   From: Sergey Bartunov [mailto:sbos@gmail.com]
   Sent: Friday, October 22, 2010 3:18 PM
   To: solr-user@lucene.apache.org
   Subject: Re: How to index long words
 with
  StandardTokenizerFactory?
  
   I'm using Solr 1.4.1. Now I'm successed
 with
  replacing lucene-core jar
   but maxTokenValue seems to be used in
 very strange
  way. Currenty for
   me it's set to 1024*1024, but I couldn't
 index a
  field with just size
   of ~34kb. I understand that it's a little
 weird to
  index such a big
   data, but I just want to know it doesn't
 work
  
   On 22 October 2010 20:36, Steven A Rowe
 sar...@syr.edu
  wrote:
Hi Sergey,
   
I've opened an issue to add a
 maxTokenLength
  param to the
   StandardTokenizerFactory configuration:
   
       https://issues.apache.org/jira/browse/SOLR-2188
   
I'll work on it this weekend.
   
Are you using Solr 1.4.1?  I ask
 because of
  your mention of Lucene
   2.9.3.  I'm not sure there will ever be
 a Solr
  1.4.2 release.  I plan on
   targeting Solr 3.1 and 4.0 for the
 SOLR-2188 fix.
   
I'm not sure why you didn't get the
 results
  you wanted with your Lucene
   hack - is it possible you have other
 Lucene jars
  in your Solr classpath?
   
Steve
   
-Original Message-
From: Sergey Bartunov [mailto:sbos@gmail.com]
Sent: Friday, October 22, 2010
 12:08 PM
To: solr-user@lucene.apache.org
Subject: How to index long words
 with
  StandardTokenizerFactory?
   
I'm trying to force solr to
 index words
  which length is more than 255
symbols (this constant is
  DEFAULT_MAX_TOKEN_LENGTH in lucene
StandardAnalyzer.java) using
  StandardTokenizerFactory as 'filter' tag
in schema configuration XML.
 Specifying
  the maxTokenLength attribute
won't work.
   
I'd tried to make the dirty
 hack: I
  downloaded lucene-core-2.9.3 src
and changed the
 DEFAULT_MAX_TOKEN_LENGTH
  to 100, built it to jar
and replaced original
 lucene-core jar in
  solr /lib. But seems like
that it had bring no effect.
 
 
 
 
 







Re: xpath processing

2010-10-23 Thread Ken Stanley
On Fri, Oct 22, 2010 at 11:52 PM, pghorp...@ucla.edu wrote:



 dataConfig
 dataSource name=myfilereader type=FileDataSource/
 document
 entity name=f rootEntity=false dataSource=null
 processor=FileListEntityProcessor fileName=.*xml recursive=true
 baseDir=C:\data\sample_records\mods\starr
 entity name=x dataSource=myfilereader processor=XPathEntityProcessor
 url=${f.fileAbsolutePath} stream=false forEach=/mods
 transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer
 field column=id template=${f.file}/
 field column=collectionKey template=starr/
 field column=collectionName template=starr/
 field column=fileAbsolutePath template=${f.fileAbsolutePath}/
 field column=fileName template=${f.file}/
 field column=fileSize template=${f.fileSize}/
 field column=fileLastModified template=${f.fileLastModified}/
 field column=classification_keyword xpath=/mods/classification/
 field column=accessCondition_keyword xpath=/mods/accessCondition/
 field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date']
 /
 /entity
 /entity
 /document
 /dataConfig


The documentation says you don't need a dataSource for your
XPathEntityProcessor entity; in my configuration, I have mine set to the
name of the top-level FileListEntityProcessor. Everything else looks fine.
Can you provide one record from your data? Also, are you getting any errors
in your log?

- Ken


Re: Modelling Access Control

2010-10-23 Thread Dennis Gearon
Two things will lessen the solr admininstrative load :

1/ Follow examples of databases and *nix OSs. Give each user their own group, 
or set up groups that don't have regular users as OWNERS, but can have users 
assigned to the group to give them particular permissions. I.E. Roles, like 
publishers, reviewers, friends, etc.

2/ Put your ACL outside of Solr, using your server-side/command line language's 
object oriented properties. Force all searches to come from a single location 
in code (not sure how to do that), and make the piece of code check 
authentication and authorization.

This is what my research shows how others do it, and how I plan to do it. ANY 
insight others have on this, I really want to hear.

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/23/10, Paul Carey paul.p.ca...@gmail.com wrote:

 From: Paul Carey paul.p.ca...@gmail.com
 Subject: Modelling Access Control
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 1:03 AM
 Hi
 
 My domain model is made of users that have access to
 projects which
 are composed of items. I'm hoping to use Solr and would
 like to make
 sure that searches only return results for items that users
 have
 access to.
 
 I've looked over some of the older posts on this mailing
 list about
 access control and saw a suggestion along the lines of
 acl:user_id AND (actual query).
 
 While this obviously works, there are a couple of niggles.
 Every item
 must have a list of valid user ids (typically less than 100
 in my
 case). Every time a collaborator is added to or removed
 from a
 project, I need to update every item in that project. This
 will
 typically be fewer than 1000 items, so I guess is no big
 deal.
 
 I wondered if the following might be a reasonable
 alternative,
 assuming the number of projects to which a user has access
 is lower
 than a certain bound.
 (acl:project_id OR acl:project_id OR ... )
 AND (actual query)
 
 When the numbers are small - e.g. each user has access to
 ~20 projects
 and each project has ~20 collaborators - is one approach
 preferable
 over another? And when outliers exist - e.g. a project with
 2000
 collaborators, or a user with access to 2000 projects - is
 one
 approach more liable to fail than the other?
 
 Many thanks
 
 Paul



Re: Modelling Access Control

2010-10-23 Thread Dennis Gearon
why use filter queries?

Wouldn't reducing the set headed into the filters by putting it in the main 
query be faster? (A question to learn, since I do NOT know :-)

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/23/10, Israel Ekpo israele...@gmail.com wrote:

 From: Israel Ekpo israele...@gmail.com
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 7:01 AM
 Hi Paul,
 
 Regardless of how you implement it, I would recommend you
 use filter queries
 for the permissions check rather than making it part of the
 main query.
 
 On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com
 wrote:
 
  Hi
 
  My domain model is made of users that have access to
 projects which
  are composed of items. I'm hoping to use Solr and
 would like to make
  sure that searches only return results for items that
 users have
  access to.
 
  I've looked over some of the older posts on this
 mailing list about
  access control and saw a suggestion along the lines
 of
  acl:user_id AND (actual query).
 
  While this obviously works, there are a couple of
 niggles. Every item
  must have a list of valid user ids (typically less
 than 100 in my
  case). Every time a collaborator is added to or
 removed from a
  project, I need to update every item in that project.
 This will
  typically be fewer than 1000 items, so I guess is no
 big deal.
 
  I wondered if the following might be a reasonable
 alternative,
  assuming the number of projects to which a user has
 access is lower
  than a certain bound.
  (acl:project_id OR acl:project_id OR
 ... ) AND (actual query)
 
  When the numbers are small - e.g. each user has access
 to ~20 projects
  and each project has ~20 collaborators - is one
 approach preferable
  over another? And when outliers exist - e.g. a project
 with 2000
  collaborators, or a user with access to 2000 projects
 - is one
  approach more liable to fail than the other?
 
  Many thanks
 
  Paul
 
 
 
 
 -- 
 °O°
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the
 gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/



Re: Modelling Access Control

2010-10-23 Thread Dennis Gearon
Forgot to add,
3/ The external, application code selects the GROUPS that the user has 
permission to read (Solr will only serve up what is to be read?) then search on 
those groups.


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/23/10, Dennis Gearon gear...@sbcglobal.net wrote:

 From: Dennis Gearon gear...@sbcglobal.net
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 11:49 AM
 Two things will lessen the solr
 admininstrative load :
 
 1/ Follow examples of databases and *nix OSs. Give each
 user their own group, or set up groups that don't have
 regular users as OWNERS, but can have users assigned to the
 group to give them particular permissions. I.E. Roles, like
 publishers, reviewers, friends, etc.
 
 2/ Put your ACL outside of Solr, using your
 server-side/command line language's object oriented
 properties. Force all searches to come from a single
 location in code (not sure how to do that), and make the
 piece of code check authentication and authorization.
 
 This is what my research shows how others do it, and how I
 plan to do it. ANY insight others have on this, I really
 want to hear.
 
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes.
 It is usually a better idea to learn from others’
 mistakes, so you do not have to make them yourself. from 
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Sat, 10/23/10, Paul Carey paul.p.ca...@gmail.com
 wrote:
 
  From: Paul Carey paul.p.ca...@gmail.com
  Subject: Modelling Access Control
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 1:03 AM
  Hi
  
  My domain model is made of users that have access to
  projects which
  are composed of items. I'm hoping to use Solr and
 would
  like to make
  sure that searches only return results for items that
 users
  have
  access to.
  
  I've looked over some of the older posts on this
 mailing
  list about
  access control and saw a suggestion along the lines
 of
  acl:user_id AND (actual query).
  
  While this obviously works, there are a couple of
 niggles.
  Every item
  must have a list of valid user ids (typically less
 than 100
  in my
  case). Every time a collaborator is added to or
 removed
  from a
  project, I need to update every item in that project.
 This
  will
  typically be fewer than 1000 items, so I guess is no
 big
  deal.
  
  I wondered if the following might be a reasonable
  alternative,
  assuming the number of projects to which a user has
 access
  is lower
  than a certain bound.
  (acl:project_id OR acl:project_id OR
 ... )
  AND (actual query)
  
  When the numbers are small - e.g. each user has access
 to
  ~20 projects
  and each project has ~20 collaborators - is one
 approach
  preferable
  over another? And when outliers exist - e.g. a project
 with
  2000
  collaborators, or a user with access to 2000 projects
 - is
  one
  approach more liable to fail than the other?
  
  Many thanks
  
  Paul
 



Re: Multiple indexes inside a single core

2010-10-23 Thread Erick Erickson
Ah, I should have read more carefully...

I remember this being discussed on the dev list, and I thought there might
be
a Jira attached but I sure can't find it.

If you're willing to work on it, you might hop over to the solr dev list and
start
a discussion, maybe ask for a place to start. I'm sure some of the devs have
thought about this...

If nobody on the dev list says There's already a JIRA on it, then you
should
open one. The Jira issues are generally preferred when you start getting
into
design because the comments are preserved for the next person who tries
the idea or makes changes, etc

Best
Erick

On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess ben.bogg...@gmail.com wrote:

 Thanks Erick.  The problem with multiple cores is that the documents are
 scored independently in each core.  I would like to be able to search across
 both cores and have the scores 'normalized' in a way that's similar to what
 Lucene's MultiSearcher would do.  As far a I understand, multiple cores
 would likely result in seriously skewed scores in my case since the
 documents are not distributed evenly or randomly.  I could have one
 core/index with 20 million docs and another with 200.

 I've poked around in the code and this feature doesn't seem to exist.  I
 would be happy with finding a decent place to try to add it.  I'm not sure
 if there is a clean place for it.

 Ben

 On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  It seems to me that multiple cores are along the lines you
  need, a single instance of Solr that can search across multiple
  sub-indexes that do not necessarily share schemas, and are
  independently maintainable..
 
  This might be a good place to start:
 http://wiki.apache.org/solr/CoreAdmin
 
  HTH
  Erick
 
  On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com
 wrote:
 
  We are trying to convert a Lucene-based search solution to a
  Solr/Lucene-based solution.  The problem we have is that we currently
 have
  our data split into many indexes and Solr expects things to be in a
 single
  index unless you're sharding.  In addition to this, our indexes wouldn't
  work well using the distributed search functionality in Solr because the
  documents are not evenly or randomly distributed.  We are currently
 using
  Lucene's MultiSearcher to search over subsets of these indexes.
 
  I know this has been brought up a number of times in previous posts and
 the
  typical response is that the best thing to do is to convert everything
 into
  a single index.  One of the major reasons for having the indexes split
 up
  the way we do is because different types of data need to be indexed at
  different intervals.  You may need one index to be updated every 20
 minutes
  and another is only updated every week.  If we move to a single index,
 then
  we will constantly be warming and replacing searchers for the entire
  dataset, and will essentially render the searcher caches useless.  If we
  were able to have multiple indexes, they would each have a searcher and
  updates would be isolated to a subset of the data.
 
  The other problem is that we will likely need to shard this large single
  index and there isn't a clean way to shard randomly and evenly across
 the
  of
  the data.  We would, however like to shard a single data type.  If we
 could
  use multiple indexes, we would likely be also sharding a small sub-set
 of
  them.
 
  Thanks in advance,
 
  Ben
 



Re: FieldCache

2010-10-23 Thread Erick Erickson
Why do you want to? Basically, the caches are there to improve
#searching#. To search something, you must index it. Retrieving
it is usually a rare enough operation that caching is irrelevant.

This smells like an XY problem, see:
http://people.apache.org/~hossman/#xyproblem

If this seems like gibberish, could you explain your problem
a little more?

Best
Erick

On Thu, Oct 21, 2010 at 10:20 AM, Mathias Walter mathias.wal...@gmx.netwrote:

 Hi,

 does a field which should be cached needs to be indexed?

 I have a binary field which is just stored. Retrieving it via
 FieldCache.DEFAULT.getTerms returns empty ByteRefs.

 Then I found the following post:
 http://www.mail-archive.com/d...@lucene.apache.org/msg05403.html

 How can I use the FieldCache with a binary field?

 --
 Kind regards,
 Mathias




Re: How to index long words with StandardTokenizerFactory?

2010-10-23 Thread Ahmet Arslan
Ops I am sorry, I thought that solr/lib refers to solrhome/lib.

I just tested this and it seems that you have successfully increased the max 
token length. You can verify this by analysis.jsp page.

Although analysis.jsp's output, it seems that some other mechanism is 
preventing this huge token to be indexed. Response of 
http://localhost:8983/solr/terms?terms.fl=body
 does not have that huge token.

If you are interested in only prefix queries, as a workaround, you can use  
filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / 
at index time.  So the query (without star) 
solr/select?q=body:big will return that document. 

By the way for this particular task you don't need to edit lucene/solr disto. 
You can use this class for this with standard pre-compiled solr.war.
By putting jar into SolrHome/lib directory.

package foo.solr.analysis;

import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.solr.analysis.BaseTokenizerFactory;
import java.io.Reader;


public class CustomStandardTokenizerFactory extends BaseTokenizerFactory {
  public StandardTokenizer create(Reader input) {
    final StandardTokenizer tokenizer = new StandardTokenizer(input);
    tokenizer.setMaxTokenLength(Integer.MAX_VALUE);
    return tokenizer;
  }
}

fieldType name=text_block class=solr.TextField positionIncrementGap=100
      analyzer
        tokenizer class=foo.solr.analysis.CustomStandardTokenizerFactory /  
     
      /analyzer
    /fieldType

--- On Sat, 10/23/10, Sergey Bartunov sbos@gmail.com wrote:

 From: Sergey Bartunov sbos@gmail.com
 Subject: Re: How to index long words with StandardTokenizerFactory?
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 6:01 PM
 This is exactly what I did. Look:
 
   3) I replace lucene-core-2.9.3.jar in
 solr/lib/ by
  my
   lucene-core-2.9.3-dev.jar that I'd just
 compiled
   4) than I do ant compile and ant dist
 in solr
  folder
   5) after that I recompile
  solr/example/webapps/solr.war
 
 On 23 October 2010 18:53, Ahmet Arslan iori...@yahoo.com
 wrote:
  I think you should replace your new
 lucene-core-2.9.3-dev.jar in \apache-solr-1.4.1\lib and then
 create a new solr.war under \apache-solr-1.4.1\dist. And
 copy this new solr.war to solr/example/webapps/solr.war
 
  --- On Sat, 10/23/10, Sergey Bartunov sbos@gmail.com
 wrote:
 
  From: Sergey Bartunov sbos@gmail.com
  Subject: Re: How to index long words with
 StandardTokenizerFactory?
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 5:45 PM
  Yes. I did. Won't help.
 
  On 23 October 2010 17:45, Ahmet Arslan iori...@yahoo.com
  wrote:
   Did you delete the folder
  Jetty_0_0_0_0_8983_solr.war_** under
  apache-solr-1.4.1\example\work?
  
   --- On Sat, 10/23/10, Sergey Bartunov sbos@gmail.com
  wrote:
  
   From: Sergey Bartunov sbos@gmail.com
   Subject: Re: How to index long words
 with
  StandardTokenizerFactory?
   To: solr-user@lucene.apache.org
   Date: Saturday, October 23, 2010, 3:56
 PM
   Here are all the files: http://rghost.net/3016862
  
   1) StandardAnalyzer.java,
 StandardTokenizer.java -
  patched
   files from
   lucene-2.9.3
   2) I patch these files and build lucene
 by typing
  ant
   3) I replace lucene-core-2.9.3.jar in
 solr/lib/ by
  my
   lucene-core-2.9.3-dev.jar that I'd just
 compiled
   4) than I do ant compile and ant dist
 in solr
  folder
   5) after that I recompile
  solr/example/webapps/solr.war
   with my new
   solr and lucene-core jars
   6) I put my schema.xml in
 solr/example/solr/conf/
   7) then I do java -jar start.jar in
  solr/example
   8) index big_post.xml
   9) trying to find this document by curl
   http://localhost:8983/solr/select?q=body:big*;
   (big_post.xml contains
   a long word biga...)
   10) solr returns nothing
  
   On 23 October 2010 02:43, Steven A Rowe
 sar...@syr.edu
   wrote:
Hi Sergey,
   
What does your ~34kb field value
 look like?
   Does
   StandardTokenizer think it's just one
 token?
   
What doesn't work?  What happens?
   
Steve
   
-Original Message-
From: Sergey Bartunov [mailto:sbos@gmail.com]
Sent: Friday, October 22, 2010
 3:18 PM
To: solr-user@lucene.apache.org
Subject: Re: How to index long
 words
  with
   StandardTokenizerFactory?
   
I'm using Solr 1.4.1. Now I'm
 successed
  with
   replacing lucene-core jar
but maxTokenValue seems to be
 used in
  very strange
   way. Currenty for
me it's set to 1024*1024, but I
 couldn't
  index a
   field with just size
of ~34kb. I understand that it's
 a little
  weird to
   index such a big
data, but I just want to know it
 doesn't
  work
   
On 22 October 2010 20:36, Steven
 A Rowe
  sar...@syr.edu
   wrote:
 Hi Sergey,

 I've opened an issue to add
 a
  maxTokenLength
   param to the
StandardTokenizerFactory
 configuration:

        https://issues.apache.org/jira/browse/SOLR-2188

 I'll 

Re: Solr sorting problem

2010-10-23 Thread Erick Erickson
In general, the behavior when sorting is not predictable when
sorting on a tokenized field, which text is. What would
it mean to sort on a field with erick Moazzam as tokens
in a single document? Should it be in the es or the ms?

That said, you probably want to watch out for case

Best
Erick

On Fri, Oct 22, 2010 at 10:02 AM, Moazzam Khan moazz...@gmail.com wrote:

 For anyone who faced the same problem, changing the field to string
 from text worked!

 -Moazzam

 On Fri, Oct 22, 2010 at 8:50 AM, Moazzam Khan moazz...@gmail.com wrote:
  The field type of the first name and last name is text. Could that be
  why it's not sorting properly? I just changed it to string and started
  a full-import. Hopefully that will work.
 
  Thanks,
  Moazzam
 
  On Thu, Oct 21, 2010 at 7:42 PM, Jayendra Patil
  jayendra.patil@gmail.com wrote:
  need additional information .
  Sorting is easy in Solr just by passing the sort parameter
 
  However, when it comes to text sorting it depends on how you analyse
  and tokenize your fields
  Sorting does not work on fields with multiple tokens.
 
 http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F
 
  On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan moazz...@gmail.com
 wrote:
 
  Hey guys,
 
  I have a list of people indexed in Solr. I am trying to sort by their
  first names but I keep getting results that are not alphabetically
  sorted (I see the names starting with W before the names starting with
  A). I have a feeling that the results are first being sorted by
  relevancy then sorted by first name.
 
  Is there a way I can get the results to be sorted alphabetically?
 
  Thanks,
  Moazzam
 
 
 



Re: MoreLikeThis explanation?

2010-10-23 Thread Koji Sekiguchi

Hi Darren,

Usually patches are written for the latest trunk branch at the time.

I've just updated the patch. Try it for the current trunk if you prefer.

Koji
--
http://www.rondhuit.com/en/

(10/10/22 19:10), Darren Govoni wrote:

Hi Koji,
I tried to apply your patch to the 1.4.0 tagged branch, but it didn't
take completely.
What branch does it work for?

Darren

On Thu, 2010-10-21 at 23:03 +0900, Koji Sekiguchi wrote:


(10/10/21 20:33), dar...@ontrenet.com wrote:

Hi,
Does the latest Solr provide an explanation for results returned by MLT?


No, but there is an open issue:

https://issues.apache.org/jira/browse/SOLR-860

Koji







Re: Modelling Access Control

2010-10-23 Thread Savvas-Andreas Moysidis
Pushing ACL logic outside Solr sounds like a prudent choice indeed as in, my
opinion, all of the business rules/conceptual logic should reside only
within the code boundaries. This way your domain will be easier to model and
your code to read, understand and maintain.

More information on Filter Queries, when they should be used and how they
affect performance can be found here:
http://wiki.apache.org/solr/FilterQueryGuidance

On 23 October 2010 20:00, Dennis Gearon gear...@sbcglobal.net wrote:

 Forgot to add,
 3/ The external, application code selects the GROUPS that the user has
 permission to read (Solr will only serve up what is to be read?) then search
 on those groups.


 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make them
 yourself. from '
 http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Sat, 10/23/10, Dennis Gearon gear...@sbcglobal.net wrote:

  From: Dennis Gearon gear...@sbcglobal.net
  Subject: Re: Modelling Access Control
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 11:49 AM
  Two things will lessen the solr
  admininstrative load :
 
  1/ Follow examples of databases and *nix OSs. Give each
  user their own group, or set up groups that don't have
  regular users as OWNERS, but can have users assigned to the
  group to give them particular permissions. I.E. Roles, like
  publishers, reviewers, friends, etc.
 
  2/ Put your ACL outside of Solr, using your
  server-side/command line language's object oriented
  properties. Force all searches to come from a single
  location in code (not sure how to do that), and make the
  piece of code check authentication and authorization.
 
  This is what my research shows how others do it, and how I
  plan to do it. ANY insight others have on this, I really
  want to hear.
 
  Dennis Gearon
 
  Signature Warning
  
  It is always a good idea to learn from your own mistakes.
  It is usually a better idea to learn from others’
  mistakes, so you do not have to make them yourself. from '
 http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  EARTH has a Right To Life,
otherwise we all die.
 
 
  --- On Sat, 10/23/10, Paul Carey paul.p.ca...@gmail.com
  wrote:
 
   From: Paul Carey paul.p.ca...@gmail.com
   Subject: Modelling Access Control
   To: solr-user@lucene.apache.org
   Date: Saturday, October 23, 2010, 1:03 AM
   Hi
  
   My domain model is made of users that have access to
   projects which
   are composed of items. I'm hoping to use Solr and
  would
   like to make
   sure that searches only return results for items that
  users
   have
   access to.
  
   I've looked over some of the older posts on this
  mailing
   list about
   access control and saw a suggestion along the lines
  of
   acl:user_id AND (actual query).
  
   While this obviously works, there are a couple of
  niggles.
   Every item
   must have a list of valid user ids (typically less
  than 100
   in my
   case). Every time a collaborator is added to or
  removed
   from a
   project, I need to update every item in that project.
  This
   will
   typically be fewer than 1000 items, so I guess is no
  big
   deal.
  
   I wondered if the following might be a reasonable
   alternative,
   assuming the number of projects to which a user has
  access
   is lower
   than a certain bound.
   (acl:project_id OR acl:project_id OR
  ... )
   AND (actual query)
  
   When the numbers are small - e.g. each user has access
  to
   ~20 projects
   and each project has ~20 collaborators - is one
  approach
   preferable
   over another? And when outliers exist - e.g. a project
  with
   2000
   collaborators, or a user with access to 2000 projects
  - is
   one
   approach more liable to fail than the other?
  
   Many thanks
  
   Paul
  
 



Re: pf parameter in edismax (SOLR-1553)

2010-10-23 Thread Jan Høydahl / Cominvent
Answering my own question:
The pf feature only kicks in with multi term q param. In my case I used a 
field tokenized by KeywordTokenizer, hence pf never kicked in.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 14. okt. 2010, at 13.29, Jan Høydahl / Cominvent wrote:

 Hi,
 
 Have applied SOLR-1553 to 1.4.2 and it works great.
 However, I can't get the pf param to work. Example:
   q=foo barqf=title^2.0 body^0.5pf=title^50.0
 
 Shouldn't I see the phrase query boost in debugQuery? Currently I see no 
 trace of pf being used.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 



Re: Modelling Access Control

2010-10-23 Thread Israel Ekpo
Hi All,

I think using filter queries will be a good option to consider because of
the following reasons

* The filter query does not affect the score of the items in the result set.
If the ACL logic is part of the main query, it could influence the scores of
the items in the result set.

* Using a filter query could lead to better performance in complex queries
because the results from the query specified with fq are cached
independently from that of the main query. Since the result of a filter
query is cached, it will be used to filter the primary query result using
set intersection without having to fetch the ids of the documents from the
fq again a second time.

It think this will be useful because we could assume that the ACL portion in
the fq is relatively constant since the permissions for each user is not
something that is changing frequently.

http://wiki.apache.org/solr/FilterQueryGuidance


On Sat, Oct 23, 2010 at 2:58 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 why use filter queries?

 Wouldn't reducing the set headed into the filters by putting it in the main
 query be faster? (A question to learn, since I do NOT know :-)

 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make them
 yourself. from '
 http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Sat, 10/23/10, Israel Ekpo israele...@gmail.com wrote:

  From: Israel Ekpo israele...@gmail.com
  Subject: Re: Modelling Access Control
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 7:01 AM
  Hi Paul,
 
  Regardless of how you implement it, I would recommend you
  use filter queries
  for the permissions check rather than making it part of the
  main query.
 
  On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com
  wrote:
 
   Hi
  
   My domain model is made of users that have access to
  projects which
   are composed of items. I'm hoping to use Solr and
  would like to make
   sure that searches only return results for items that
  users have
   access to.
  
   I've looked over some of the older posts on this
  mailing list about
   access control and saw a suggestion along the lines
  of
   acl:user_id AND (actual query).
  
   While this obviously works, there are a couple of
  niggles. Every item
   must have a list of valid user ids (typically less
  than 100 in my
   case). Every time a collaborator is added to or
  removed from a
   project, I need to update every item in that project.
  This will
   typically be fewer than 1000 items, so I guess is no
  big deal.
  
   I wondered if the following might be a reasonable
  alternative,
   assuming the number of projects to which a user has
  access is lower
   than a certain bound.
   (acl:project_id OR acl:project_id OR
  ... ) AND (actual query)
  
   When the numbers are small - e.g. each user has access
  to ~20 projects
   and each project has ~20 collaborators - is one
  approach preferable
   over another? And when outliers exist - e.g. a project
  with 2000
   collaborators, or a user with access to 2000 projects
  - is one
   approach more liable to fail than the other?
  
   Many thanks
  
   Paul
  
 
 
 
  --
  °O°
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the
  gift.
  Quality First. Measure Twice. Cut Once.
  http://www.israelekpo.com/
 




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/