abt Multicore

2008-11-17 Thread Raghunandan Rao
Hi,

I have an app running on weblogic and oracle. Oracle DB is quite huge;
say some 10 millions of records. I need to integrate Solr for this and I
am planning to use multicore. How can multicore feature can be at the
best? 

 

-Raghu



Re: Build Solr to run SolrJS

2008-11-17 Thread JCodina


To give you more information.

The error I get is this one:

java.lang.NoClassDefFoundError:
org/apache/solr/request/VelocityResponseWriter (wrong name:
contrib/velocity/src/main/java/org/apache/solr/request/VelocityResponseWriter)
at java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:621) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) at
org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:1847)
at
org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:890)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1354)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:247) at 


and in the build  , if i do a build-contrib-dist
i get these messages..
...
build:
  [jar] Building jar:
/home/joan/workspace/solr/contrib/dataimporthandler/target/apache-solr-dataimporthandler-1.4-dev.jar
dist:
 [copy] Copying 1 file to
/home/joan/workspace/solr/build/web/WEB-INF/lib
 [copy] Copying 1 file to /home/joan/workspace/solr/dist
init:
init-forrest-entities:
compile-common:
compile:
make-manifest:
compile:
[javac] Compiling 4 source files to
/home/joan/workspace/solr/contrib/velocity/target/classes
build:
  [jar] Building jar:
/home/joan/workspace/solr/contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4-dev.jar
dist:
...
where the dataimporthanler seems that is copied in the dist folders but
velocity is not.


Hope this saves you some time..

Joan
-- 
View this message in context: 
http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20535777.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: abt Multicore

2008-11-17 Thread Shalin Shekhar Mangar
On Mon, Nov 17, 2008 at 2:17 PM, Raghunandan Rao 
[EMAIL PROTECTED] wrote:


 I have an app running on weblogic and oracle. Oracle DB is quite huge;
 say some 10 millions of records. I need to integrate Solr for this and I
 am planning to use multicore. How can multicore feature can be at the
 best?


To index records from a database, you can take a look at DataImportHandler.

It would help if you are a bit more specific than that. What exactly do you
want to know? It also helps if you tell us why you want to know about one
particular thing, so that we may advise on better alternative solutions.

-- 
Regards,
Shalin Shekhar Mangar.


using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese

Hey there,

I have posted before telling about my situation but I thing my explanation
was a bit confusing...
I am using dataImportHanlder and delta-import and it's working perfectly. I
have also coded my own SqlEntityProcesor to delete from the index and
database expired rows.

Now I need to do duplication control at indexing time. In my old lucene core
I made my own duplication control but it was so slow as it worked comparing
strings... I have been investigating solr deduplication
(http://wiki.apache.org/solr/Deduplication) and it seems so cool as it works
with hashes instead of strings.

I have learned how to use deduplication using the /update requestHandler as
the wiki says:
 requestHandler name=/update class=solr.XmlUpdateRequestHandler 
lst name=defaults
  str name=update.processordedupe/str
/lst
  /requestHandler

But the thing is that I want to use it with the /dataimport requestHanlder
(the one used by dataimporthandler). I don't know if there's a possible xml
configuration to add deduplication to dataimportHandler or I should code a
plugin... in that case, I don't exacly now where.

Hope my explanation is more clear now...
Thank's in advanced!


-- 
View this message in context: 
http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20536053.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: using deduplication with dataimporthandler

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
Any update processor can be used with DIH . First of all you may
register your dedupe update processor as you do now. You can either
pass the update.processor is the request parameter pr you can keep the
it in the 'defaults' of  datataimport handler

 str name=update.processordedupe/str

On Mon, Nov 17, 2008 at 2:48 PM, Marc Sturlese [EMAIL PROTECTED] wrote:

 Hey there,

 I have posted before telling about my situation but I thing my explanation
 was a bit confusing...
 I am using dataImportHanlder and delta-import and it's working perfectly. I
 have also coded my own SqlEntityProcesor to delete from the index and
 database expired rows.

 Now I need to do duplication control at indexing time. In my old lucene core
 I made my own duplication control but it was so slow as it worked comparing
 strings... I have been investigating solr deduplication
 (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it works
 with hashes instead of strings.

 I have learned how to use deduplication using the /update requestHandler as
 the wiki says:
  requestHandler name=/update class=solr.XmlUpdateRequestHandler 
lst name=defaults
  str name=update.processordedupe/str
/lst
  /requestHandler

 But the thing is that I want to use it with the /dataimport requestHanlder
 (the one used by dataimporthandler). I don't know if there's a possible xml
 configuration to add deduplication to dataimportHandler or I should code a
 plugin... in that case, I don't exacly now where.

 Hope my explanation is more clear now...
 Thank's in advanced!


 --
 View this message in context: 
 http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20536053.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


RE: solr 1.3 Modification field in schema.xml

2008-11-17 Thread sunnyfr

Hi Todd,

Thanks for this answer, ok but it's not just showing or not in the list,
if a field is not shown but it's boost using qf do I need to store it ???
For a language field which need some special configuration like stemming ... 

thanks a lot for your clear answer,


I believe (someone correct me if I'm wrong) that the only fields you
need to store are those fields which you wish returned from the query.
In other words, if you will never put the field on the list of fields
(fl) to return, there is no need to store it.

It would be advantageous not to store more then you have to. It reduces
disk access, index size, memory usage, etc. However, you have to balance
this against future needs. If re-indexing is costly just to start
storing 1 more field, it may be worth it to just leave it in.

-Todd Feak

-- 
View this message in context: 
http://www.nabble.com/solr-1.3--Modification-field-in-schema.xml-tp20483691p20536926.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need help with SolrIndexSearcher CoreContainer

2008-11-17 Thread Kraus, Ralf | pixelhouse GmbH

Hi,

After 5-6 searches I run out of memory :-(

Examples:

  String homeDir  = /var/lib/tomcat5.5/webapps/solr;
  File configFile = new File( homeDir, solr.xml );
CoreContainer myCoreContainer = new CoreContainer( 
homeDir, configFile );

  mySolrCore = myCoreContainer.getCore(core_de);
RefCountedSolrIndexSearcher temp_search = 
mySolrCore.getSearcher();

  SolrIndexSearcher searcher = temp_search.get();

No one ever worked directly with CoreContainer and SolrIndexSearcher ?

Greets -Ralf-


Re: solr 1.3 Modification field in schema.xml

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Nov 13, 2008 at 10:43 PM, sunnyfr [EMAIL PROTECTED] wrote:

 Hi everybody,

 I don't get really when do I have to re index datas or not.
 I did a full import but I realised I stored too many fields which I don't
 need.

 So I have to change some fields inedexed which are stored to not stored.
 And I don't know if I have to re index my datas or not and in which case
 really do I have to re index datas.
You will have to re-index

 Another question, I would like to know which field must be stored, I thought
 it was field which use function for boosting, but I just tried to boost one
 field indexed but not stored and it worked.

 Thanks a lot for putting some light on my questions,

 --
 View this message in context: 
 http://www.nabble.com/solr-1.3--Modification-field-in-schema.xml-tp20483691p20483691.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese

Thank you so much. I have it sorted.
I am wondering now if there is any more stable way to use deduplication than
adding to the solr source project this patch:
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
(SOLR-799.patch 2008-11-12 05:10 PM this one exactly).

I have downloaded the last nightly-build source code and couldn't see the
needed classes in there.
Anyones knows something?Should I ask this in the developers forum?

Thanks in advanced


Marc Sturlese wrote:
 
 Hey there,
 
 I have posted before telling about my situation but I thing my explanation
 was a bit confusing...
 I am using dataImportHanlder and delta-import and it's working perfectly.
 I have also coded my own SqlEntityProcesor to delete from the index and
 database expired rows.
 
 Now I need to do duplication control at indexing time. In my old lucene
 core I made my own duplication control but it was so slow as it worked
 comparing strings... I have been investigating solr deduplication
 (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it
 works with hashes instead of strings.
 
 I have learned how to use deduplication using the /update requestHandler
 as the wiki says:
  requestHandler name=/update class=solr.XmlUpdateRequestHandler 
 lst name=defaults
   str name=update.processordedupe/str
 /lst
   /requestHandler
 
 But the thing is that I want to use it with the /dataimport requestHanlder
 (the one used by dataimporthandler). I don't know if there's a possible
 xml configuration to add deduplication to dataimportHandler or I should
 code a plugin... in that case, I don't exacly now where.
 
 Hope my explanation is more clear now...
 Thank's in advanced!
 
 
 

-- 
View this message in context: 
http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538008.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese



Marc Sturlese wrote:
 
 Thank you so much. I have it sorted.
 I am wondering now if there is any more stable way to use deduplication
 than adding to the solr source project this patch:
 https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 (SOLR-799.patch   2008-11-12 05:10 PM this one exactly).
 
 I have downloaded the last nightly-build source code and couldn't see the
 needed classes in there.
 Anyones knows something?Should I ask this in the developers forum?
 
 The thing is I can't find the class
 org.apache.solr.update.processor.DeduplicateUpdateProcessorFactory
 anywhere...
 
 Thanks in advanced
 
 
-- 
View this message in context: 
http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538077.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: using deduplication with dataimporthandler

2008-11-17 Thread Shalin Shekhar Mangar
On Mon, Nov 17, 2008 at 5:18 PM, Marc Sturlese [EMAIL PROTECTED]wrote:


 Thank you so much. I have it sorted.
 I am wondering now if there is any more stable way to use deduplication
 than
 adding to the solr source project this patch:

 https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 (SOLR-799.patchhttps://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel%28SOLR-799.patch
 2008-11-12 05:10 PM this one exactly).

 I have downloaded the last nightly-build source code and couldn't see the
 needed classes in there.
 Anyones knows something?Should I ask this in the developers forum?


The issue is still open, but I don't think it will remain open for long.
Most likely, it will be released with the next Solr version.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher


On Nov 17, 2008, at 3:55 AM, JCodina wrote:

java.lang.NoClassDefFoundError:
org/apache/solr/request/VelocityResponseWriter (wrong name:

...

 [jar] Building jar:
/home/joan/workspace/solr/contrib/dataimporthandler/target/apache- 
solr-dataimporthandler-1.4-dev.jar

dist:

...


 [jar] Building jar:
/home/joan/workspace/solr/contrib/velocity/src/main/solr/lib/apache- 
solr-velocity-1.4-dev.jar

dist:
...
where the dataimporthanler seems that is copied in the dist folders  
but

velocity is not.


Correct - I didn't want VelocityResponseWriter put into the example  
WAR by default.  It's a contrib, not core, so I intentionally put it  
in a separate lib directory.


Here are the instructions to wire it in successfully from trunk: http://wiki.apache.org/solr/VelocityResponseWriter 



However, it isn't currently suitable for wiring to SolrJS - Matthias  
and I will have to resolve that.


Erik



Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher


On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote:
Matthias and Ryan - let's get SolrJS integrated into contrib/ 
velocity.  Any objections/reservations?


As SolrJS may be used without velocity at all (using eg.  
ClientSideWidgets), is it possible to put it into contrib/ 
javascript and create a dependency to contrib/velocity for  
ServerSideWidgets?


Sure, contrib/javascript sounds perfect.

If that's ok, I'll have a look at the directory structure and the  
current ant build.xml to make them fit into the common solr  
structure and build.


Awesome, thanks!

Erik



Re: Solr security

2008-11-17 Thread Erik Hatcher


On Nov 16, 2008, at 6:12 PM, Ian Holsman wrote:
famous last words and all, but you shouldn't be just passing what a  
user types directly into a application should you?


LOL

I'd be parsing out wildcards, boosts, and fuzzy searches (or at  
least thinking about the effects).
I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a  
regular query.


Sounds like the perfect case for a query parser plugin... or use  
dismax as Ryan mentioned.  Shouldn't Solr be hardened for these cases  
anyway?  Or at least hardenable.



but they don't let me into design meetings any more ;(


Apparently they shouldn't let me into them either ;)

Erik



Re: Solr security

2008-11-17 Thread Erik Hatcher


On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote:

my assumption with solrjs is that you are hitting read-only solr  
servers that you don't mind if people query directly.


Exactly the assumption I'm going with too.

 It would not be appropriate for something where you don't want  
people (who really care) to know you are running solr and could  
execute arbitrary queries.


Since it is an example, I don't mind leaving the /admin interface  
open on:

http://example.solrstuff.org/solrjs/admin/
but /update has a password:
http://example.solrstuff.org/solrjs/update

I have said in the past I like the idea of a read-only flag in  
solr config that would throw an error if you try to do something  
with the UpdateHandler.  However there are other ways to do that also.


Yes, I was asked about this elusive read-only switch at Solr Boot Camp  
at ApacheCon as well.


How are you password protecting the update handler?  This is the kind  
of goody I'd like to distill out of this thread and wikify http://wiki.apache.org/solr/SolrSecurity 



What's it take to make a read-only Solr server now?  Can replication  
still be made to work?  (I plead ignorance on the guts of the Java- 
based replication feature) - requires password protected handlers?   
Shouldn't we bake some of this into the default example configuration  
instead of update handlers being wide open by default?


Erik




Re: Solr security

2008-11-17 Thread Erik Hatcher


On Nov 16, 2008, at 6:27 PM, Ryan McKinley wrote:
I'd be parsing out wildcards, boosts, and fuzzy searches (or at  
least thinking about the effects).
I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a  
regular query.




Even if you leave the solr instance public, you can still limit  
grossly inefficent params by forcing things to use  the dismax query  
parser.  You can use invariants to lock what options are available.


I suppose we don't have a way to say the *maximum* number of rows  
you can request is 100 (or something like that)


A LimitingRowsSearchComponent could easily do this as a plugin though.

Erik



Re: Solr security

2008-11-17 Thread Erik Hatcher


On Nov 16, 2008, at 6:55 PM, Walter Underwood wrote:

Limiting the maximum number of rows doesn't work, because
they can request rows 2-20100. --wunder


But you could limit how many rows could be returned in a single  
request... that'd close off one DoS mechanism.


Erik



Re: Solr security

2008-11-17 Thread Yonik Seeley
On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher
[EMAIL PROTECTED] wrote:
 Sounds like the perfect case for a query parser plugin... or use dismax as
 Ryan mentioned.  Shouldn't Solr be hardened for these cases anyway?  Or at
 least hardenable.

Say you do filtering by user - how would you enforce that the client
(if it's a browser) only send in the proper filter?  Doesn't seem like
you can unless you put all the user authentication stuff and
application logic right in Solr.

Now I guess you *could* stick everything in Solr that you would
normally stick in the middle tier, but it doesn't seem like a great
idea to me.

-Yonik


Re: abt Multicore

2008-11-17 Thread Ryan McKinley
Are all the documents in the same search space?  That is, for a given  
query, could any of the 10MM docs be returned?


If so, I don't think you need to worry about multicore.  You may  
however need to put part of the index on various machines:

http://wiki.apache.org/solr/DistributedSearch

ryan


On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote:


Hi,

I have an app running on weblogic and oracle. Oracle DB is quite huge;
say some 10 millions of records. I need to integrate Solr for this  
and I

am planning to use multicore. How can multicore feature can be at the
best?



-Raghu





Using properties from core configuration in data-config.xml

2008-11-17 Thread gistolero
Hello,

is it possible to use properties from core configuration in data-config.xml?
I want to define the baseDir for DataImportHandler.


I tried the following configuration:


*** solr.xml ***

solr persistent=false
  cores adminPath='null'
core name=core0 instanceDir=/opt/solr/cores/core0
 property name=solrDataDir value=/opt/solr/cores/core0/data /
  property name=xmlDataDir value=/home/xml/core0 /
/core
...
  /cores
/solr




*** data-config.xml ***

dataConfig
 dataSource type=FileDataSource /
  document
   entity name=xmlFile
 processor=FileListEntityProcessor
 baseDir=${xmlDataDir}
 fileName=id-.*\.xml
 rootEntity=false
 dataSource=null
 entity name=data
  pk=id
  url=${xmlFile.fileAbsolutePath}
  processor=XPathEntityProcessor
...
/dataConfig



But this is the result:

...
Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
INFO: Starting Full Import
Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute
INFO: [posts-politics] webapp=/solr path=/dataimport 
params={optimize=truecommit=truecommand=full-importqt=/dataimportwt=javabinversion=2.2}
 status=0 QTime=66 
Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute
INFO: [posts-politics] webapp=/solr path=/dataimport 
params={qt=/dataimportwt=javabinversion=2.2} status=0 QTime=0 
Nov 17, 2008 1:50:08 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [posts-politics] REMOVING ALL DOCUMENTS FROM INDEX
Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' should 
point to a directory Processing Document # 1
 at 
org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:81)
...




I tried also to configure all dataimport settings in solrconfig.xml, but I 
don't know how to do this exactly. Among other things, I tried this format:


*** solrconfig.xml ***

...
requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
  lst name=datasource
   str name=typeFileDataSource/str
   lst name=document
lst name=entity
 str name=namexmlFile/str
 str name=processorFileListEntityProcessor/str
 str name=baseDir${xmlDataDir}/str
 str name=fileNameid-.*\.xml/str
 str name=rootEntityfalse/str
 str name=dataSourcenull/str
 lst name=entity
   str name=namedata/str
   str name=pkid/str
   str name=url${xmlFile.fileAbsolutePath}/str
 ...
/requestHandler
...



But all my tests (with different dataimport formats in solrconfig.xml) failed:


...
INFO: Reusing parent classloader
Nov 17, 2008 2:18:14 PM org.apache.solr.common.SolrException log
SEVERE: Error in solrconfig.xml:org.apache.solr.common.SolrException: No system 
property or default value specified for xmlFile.fileAbsolutePath
at 
org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311)
at 
org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:264)
...



Thanks again for your excellent support!

Gisto

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer


Re: Solr security

2008-11-17 Thread Erik Hatcher


On Nov 17, 2008, at 9:07 AM, Yonik Seeley wrote:

On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher
[EMAIL PROTECTED] wrote:
Sounds like the perfect case for a query parser plugin... or use  
dismax as
Ryan mentioned.  Shouldn't Solr be hardened for these cases  
anyway?  Or at

least hardenable.


Say you do filtering by user - how would you enforce that the client
(if it's a browser) only send in the proper filter?


Ryan already mentioned his technique... and here's how I'd do it  
similarly...


  Write a custom servlet Filter that grokked roles/authentication  
(this piece you'd need in any Java application tier anyway) [or plugin  
in an existing implementation through Spring or something like that]   
And then massaging of the request to Solr could happen in that  
pipeline, or adding a query parameter to the Solr request (ignoring  
anything sent by the client request for say, user=...).  Perhaps plug  
in a custom SearchComponent that massaged a request parameter into a  
Solr filter query or whatever.



 Doesn't seem like
you can unless you put all the user authentication stuff and
application logic right in Solr.


   ;)

Exactly.  Sort of.


Now I guess you *could* stick everything in Solr that you would
normally stick in the middle tier, but it doesn't seem like a great
idea to me.


Let's be clear about where we are drawing the boundaries of the  
definition of Solr.


One could say that Solr is solr.war and the HTTP conventions.  Or is  
it solr.jar?  Or is it the SolrJ API?


Erik



Re: Solr security

2008-11-17 Thread Matthias Epheser

Erik Hatcher schrieb:


On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote:

my assumption with solrjs is that you are hitting read-only solr 
servers that you don't mind if people query directly.


Exactly the assumption I'm going with too.

 It would not be appropriate for something where you don't want people 
(who really care) to know you are running solr and could execute 
arbitrary queries.


Since it is an example, I don't mind leaving the /admin interface open 
on:

http://example.solrstuff.org/solrjs/admin/
but /update has a password:
http://example.solrstuff.org/solrjs/update

I have said in the past I like the idea of a read-only flag in solr 
config that would throw an error if you try to do something with the 
UpdateHandler.  However there are other ways to do that also.




As the thoughts and ideas of this thread are spread in several emails, let me 
just drop my uncoordinated thoughts here:


For solrjs, what exactly is the required information solr has to provide 
directly:


- We need data for several widgets. This data will be in 99% of the cases some 
facet information and/or result docs. The result docs will be in suitable 
ranges, no webpage will display 10+ result items at the same time.


- So potentially dangerous request params like rows1000 or some other 
handlers apart from StandardRequest may be blocked.


- update handlers and admin interface shouldn't be exposed.


Like others mentioned before, I'm not sure this is a task that *has* to be 
solved inside Solr. As a standalone servlet, it is verly likely that it is NOT 
accessible directly in a production environment.


Hiding or password protecting update/admin is an easy task using a proxy like 
apache http. It could also be solved by a configurable ServletFilter delivered 
with solr, that is initialized inside solr's web.xml. To separate the concerns, 
I think it should not be coded deeper inside the solr code. The idea of a 
read-only server can be implemented like that. Optional update urls that are 
only accessed inside a firewall or something may also be present.


This servlet filter may also check the request params for things that are not 
needed for solrjs and potentially dangerous. It even may check how frequently 
urls are accessed (thinking about DoS).


I think even if it looks like a direct access, using solrjs doesn't have to be 
different to common solr webapps. Usually these apps take user input, a web 
application translates this input into a solr query and translates the result in 
a suitable client format. Other solr stuff is blocked indirectly because only 
this app has access to solr. Now the last 2 steps are done inside the client. 
But if we block stuff that isn't used by the client, we are in control of what 
may happen.


If that isn't secure enough, the more complicated solution would be the create 
such a stateful servlet that holds the query state of a client, and solrjs only 
performs /select/solrjs/?new_query=city:vienna or something. Then the query 
generation and all solr related stuff happens again on the server.


I think it should easily be reached to deliver this SecuritySolrFilter with the 
standard solr distribution, making it configurable for the user to decide what 
urls are blocked/password protected and what request parameters should be 
checked for illegal values. On the other hand, existing firewalls and proxies of 
the destination system may be used.Therefore some best-practices may be 
helpful in the solr wiki.


I would be fine by me to help implementing a standard securty filter for solr.

WDYT?

regards,
matthias


Re: Solr security

2008-11-17 Thread Walter Underwood
Limiting the number of rows only handles one attack. The one I mentioned,
fetching one page deep in the result set, caused a big issue on prod at
our site. We needed to limit the max for start as well as rows.

It is possible to make it safe, but a lot of work. We did this for
Ultraseek. I would always, always front it with Apache, to get some
of Apache's protection.

wunder

On 11/17/08 6:04 AM, Erik Hatcher [EMAIL PROTECTED] wrote:
 
 On Nov 16, 2008, at 6:55 PM, Walter Underwood wrote:
 Limiting the maximum number of rows doesn't work, because
 they can request rows 2-20100. --wunder
 
 But you could limit how many rows could be returned in a single
 request... that'd close off one DoS mechanism.
 
 Erik




Re: Solr security

2008-11-17 Thread Erik Hatcher


On Nov 17, 2008, at 10:22 AM, Walter Underwood wrote:

It is possible to make it safe, but a lot of work. We did this for
Ultraseek. I would always, always front it with Apache, to get some
of Apache's protection.


What protections specifically are you speaking of with Apache in  
front?  Authentication?  Row limiting?


Erik



Solr build with Rich text document plugin added?

2008-11-17 Thread Rav Bhagdev



Solr build with Rich Document (Doc/PDF etc) plugin already added?

2008-11-17 Thread Rav Bhagdev



Advice for indexing page numbers

2008-11-17 Thread Ian Connor
How would you best deal with a page field in solr?

Possible ranges are numbers (1 to 1000s) but also could include appendix
page that include roman and alphabet characters (i, ii, iii, iv, as well as
a, b, c, etc).

It makes sense people would want to search for things between page 1 to 5
but I cannot really see how someone would search for page iv to 50.

I was thinking to split this into two fields, one is just a string for exact
matching (maybe case insensitive) and the other as a number for ranges. This
would allow search for page ranges as well as exact matches.

Has anyone had experience with pages or the like in solr? Is splitting it
into two fields like this needed or can I do that with one of the standard
filters that I have missed?

-- 
Regards,
Ian Connor


Re: Solr security

2008-11-17 Thread Ryan McKinley



Say you do filtering by user - how would you enforce that the client
(if it's a browser) only send in the proper filter?


Ryan already mentioned his technique... and here's how I'd do it  
similarly...


 Write a custom servlet Filter that grokked roles/authentication  
(this piece you'd need in any Java application tier anyway) [or  
plugin in an existing implementation through Spring or something  
like that]  And then massaging of the request to Solr could happen  
in that pipeline, or adding a query parameter to the Solr request  
(ignoring anything sent by the client request for say, user=...).   
Perhaps plug in a custom SearchComponent that massaged a request  
parameter into a Solr filter query or whatever.




right, but the question is still: is there anything general enough to  
be in solr core?


Everything I can think of requires a good sense of how the auth model  
is encoded in your data and how you want to expose it.  Nothing I have  
done is general enough to share with even my next project.


The only think I could imagine is perhaps adding getUserPrincipal()  
to the SolrRequest interface -- but this quickly explodes into also  
wanting the request method (POST vs GET) or the user-agent...  in the  
end I just add the HttpServletRequest to the context and grab stuff  
from there.  Perhaps the default RequestDispatcher could add the  
HttpServletRequest to the context...




Doesn't seem like
you can unless you put all the user authentication stuff and
application logic right in Solr.


  ;)

Exactly.  Sort of.


Now I guess you *could* stick everything in Solr that you would
normally stick in the middle tier, but it doesn't seem like a great
idea to me.


Let's be clear about where we are drawing the boundaries of the  
definition of Solr.


One could say that Solr is solr.war and the HTTP conventions.  Or is  
it solr.jar?  Or is it the SolrJ API?




all of the above :)

In my view we need to be clear about who solr.war is packaged for.  I  
think we are pretty clear that solr.war should be thought of similar  
to a MySQL install -- that is a database server that unless you  
*really* know what you are doing should most likely be behind a  
firewall.


solr.jar on the other hand lets you package what you want around  
search features to build a setup for your needs.  Java already has so  
many options for how to secure / authenticate that you can just plug  
them into your own app.  (if that is appropriate).  In the past I have  
used a filter based on:

http://www.onjava.com/pub/a/onjava/2004/03/24/loadcontrol.html
to limit load -- however I have found that in any site where stability/ 
load and uptime are a serious concern, this is better handled in a  
tier in front of java -- typically the loadbalancer / haproxy /  
whatever -- and managed by people more cautious then me.


ryan




Re: Solr security

2008-11-17 Thread Mark Miller

Ryan McKinley wrote:
solr.jar on the other hand lets you package what you want around 
search features to build a setup for your needs.  Java already has so 
many options for how to secure / authenticate that you can just plug 
them into your own app.  (if that is appropriate).  In the past I have 
used a filter based on:

http://www.onjava.com/pub/a/onjava/2004/03/24/loadcontrol.html
to limit load -- however I have found that in any site where 
stability/load and uptime are a serious concern, this is better 
handled in a tier in front of java -- typically the loadbalancer / 
haproxy / whatever -- and managed by people more cautious then me.


ryan

Couldn't agree more. Almost all security and protection belong outside 
of solr. It can and will be done better, and solr can stick to what its 
good at. Smaller things like limiting complex query attacks or something 
seem more reasonable, but any real security should be provided 
elsewhere. Wouldn't that be odd if a bunch of open source products 
reimplemented network security layers and defenses on every project...




Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser

Erik Hatcher schrieb:

However, it isn't currently suitable for wiring to SolrJS - Matthias and 
I will have to resolve that.


Just noticed that VelocityResponeWriter in trunk is very reduced to my last
patch from 2008-07-25.

Moving the templates into a jar shouldn't be a problem. Setting the contentType 
is still possible, the methods{} wrapper may be moved into the template itself.


The crucial difference is the missing translation into a solrj response by 
specifying the vl.response parameter. This was intended to make template 
creation more handy, cause the queryResponse is much nicer to navigate.


If this conversion is to specific and shouldn't be in VelocityResponseWriter, 
would it be a problem to create a subclass inside contrib/javascript?



matthias




Erik






Re: Solr security

2008-11-17 Thread Matthias Epheser

Ryan McKinley schrieb:
 however I have found that in any site where
stability/load and uptime are a serious concern, this is better handled 
in a tier in front of java -- typically the loadbalancer / haproxy / 
whatever -- and managed by people more cautious then me.


Full ack. What do you think about the only solr related thing left, the 
paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a 
Filter delivered by solr? Of course as an optional alternative.




ryan






RE: Solr security

2008-11-17 Thread Feak, Todd
I see value in this in the form of protecting the client from itself.

For example, our Solr isn't accessible from the Internet. It's all
behind firewalls. But, the client applications can make programming
mistakes. I would love the ability to lock them down to a certain number
of rows, just in case someone typos and puts in 1000 instead of 100, or
the like.

Admittedly, testing and QA should catch these things, but sometimes it's
nice to put in a few safeguards to stop the obvious mistakes from
occurring.

-Todd Feak

-Original Message-
From: Matthias Epheser [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 17, 2008 9:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr security

Ryan McKinley schrieb:
  however I have found that in any site where
 stability/load and uptime are a serious concern, this is better
handled 
 in a tier in front of java -- typically the loadbalancer / haproxy / 
 whatever -- and managed by people more cautious then me.

Full ack. What do you think about the only solr related thing left,
the 
paramter filtering/blocking (eg. rows1000). Is this suitable to do it
in a 
Filter delivered by solr? Of course as an optional alternative.

 
 ryan
 
 




sole 1.3: bug in phps response writer

2008-11-17 Thread Alok Dhir

Distributed queries:

curl 'http://devxen0:8983/solr/core0/select? 
shards=search3:0,search3:8983/solr/ 
core2version=2.2start=0rows=10q=instance%3Arit%5C- 
csm.symplicity.com+AND+label%3ALoginwt=php'


curl 'http://devxen0:8983/solr/core0/select? 
shards=search3:0,search3:8983/solr/ 
core2version=2.2start=0rows=10q=instance%3Arit%5C- 
csm.symplicity.com+AND+label%3ALoginwt=xml


curl 'http://devxen0:8983/solr/core0/select? 
shards=search3:0,search3:8983/solr/ 
core2version=2.2start=0rows=10q=instance%3Arit%5C- 
csm.symplicity.com+AND+label%3ALoginwt=json''


All work fine, providing identical results in their respective formats  
(note the change in the wt param).


curl 'http://devxen0:8983/solr/core0/select?shards=search3:8983/solr/ 
core0,search3:8983/solr/core2version=2.2start=0rows=10q=instance 
%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=phps'


fails with:

java.lang.IllegalArgumentException: Map size must not be negative
	at  
org 
.apache 
.solr 
.request 
.PHPSerializedWriter.writeMapOpener(PHPSerializedResponseWriter.java: 
195)
	at  
org 
.apache 
.solr.request.JSONWriter.writeSolrDocument(JSONResponseWriter.java:392)
	at  
org 
.apache 
.solr.request.JSONWriter.writeSolrDocumentList(JSONResponseWriter.java: 
547)
	at  
org 
.apache 
.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:147)
	at  
org 
.apache 
.solr 
.request.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java: 
150)
	at  
org 
.apache 
.solr 
.request 
.PHPSerializedWriter.writeNamedList(PHPSerializedResponseWriter.java:71)
	at  
org 
.apache 
.solr 
.request 
.PHPSerializedWriter.writeResponse(PHPSerializedResponseWriter.java:66)
	at  
org 
.apache 
.solr 
.request 
.PHPSerializedResponseWriter.write(PHPSerializedResponseWriter.java:47)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
	at org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)
	at  
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at  
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 
216)
	at  
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at  
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 
405)
	at  
org 
.mortbay 
.jetty 
.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java: 
211)
	at  
org 
.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java: 
114)
	at  
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

at org.mortbay.jetty.Server.handle(Server.java:285)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)
	at org.mortbay.jetty.HttpConnection 
$RequestHandler.headerComplete(HttpConnection.java:821)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)
	at org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)


Questions:

1) Is this known?  I didn't see it in the issue treacker.

2) What's the better course of action: a) download source, fix, submit  
patch, wait for new relase; b) drop phps and use json instead?


Thanks




Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher


On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote:
Just noticed that VelocityResponeWriter in trunk is very reduced to  
my last

patch from 2008-07-25.


Right, that was intentional for my own simplicity's sake...

The crucial difference is the missing translation into a solrj  
response by specifying the vl.response parameter. This was intended  
to make template creation more handy, cause the queryResponse is  
much nicer to navigate.


If this conversion is to specific and shouldn't be in  
VelocityResponseWriter, would it be a problem to create a subclass  
inside contrib/javascript?


I need to understand it a bit more, but no subclass is necessary...  
we'll patch it into contrib/velocity's VrW like you had it before.


Erik



Re: Solr security

2008-11-17 Thread Ryan McKinley


On Nov 17, 2008, at 12:06 PM, Matthias Epheser wrote:


Ryan McKinley schrieb:
however I have found that in any site where
stability/load and uptime are a serious concern, this is better  
handled in a tier in front of java -- typically the loadbalancer /  
haproxy / whatever -- and managed by people more cautious then me.


Full ack. What do you think about the only solr related thing  
left, the paramter filtering/blocking (eg. rows1000). Is this  
suitable to do it in a Filter delivered by solr? Of course as an  
optional alternative.




This could be done in a standard ServletFilter -- but that requires  
mucking with web.xml and may be more difficult if you are worried  
about it for some Handlers and not others.


As eric mentioned earlier, this could be done in a QueryComponent --  
the prepare part could just make sure the query parameters are all  
within reasonable ranges.  This seems like something reasonable to add  
to solr.


ryan


Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser

Erik Hatcher schrieb:


On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote:
Just noticed that VelocityResponeWriter in trunk is very reduced to my 
last

patch from 2008-07-25.


Right, that was intentional for my own simplicity's sake...

The crucial difference is the missing translation into a solrj 
response by specifying the vl.response parameter. This was intended to 
make template creation more handy, cause the queryResponse is much 
nicer to navigate.


If this conversion is to specific and shouldn't be in 
VelocityResponseWriter, would it be a problem to create a subclass 
inside contrib/javascript?


I need to understand it a bit more, but no subclass is necessary... 
we'll patch it into contrib/velocity's VrW like you had it before.


The key part is to pass a parameter like vl.response=QueryResponse, so the 
transformation works like that:


object = request.getCore().getResourceLoader().newInstance(className, 
client.solrj.response.);


solrResponse.setResponse(new 
EmbeddedSolrServer(request.getCore()).getParsedResponse(request, response));


This was done based on api changes from Ryan to generalize the second 
setResponse part. In the etmplate , there is access to the created response, as 
well as to the rawResponse.


I'll try to add the least necessary stuff to current vrw, test ist against 
solrjs and post a patch to jira.


matthias



Erik





Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher
Can you elaborate on the use case for why you need the raw response  
like that?


I vaguely get it, but want to really understand the need here.

I'm weary of the EmbeddedSolrServer usage in there, as I want to  
distill the VrW stuff to be able to use SolrJ's API rather than assume  
embedded Solr.  This way VrW can be separated from core Solr to  
another tier and template on remote Solr responses.  Thoughts on how  
this feature might play out in that scenario?


Erik


On Nov 17, 2008, at 1:09 PM, Matthias Epheser wrote:

Erik Hatcher schrieb:

On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote:
Just noticed that VelocityResponeWriter in trunk is very reduced  
to my last

patch from 2008-07-25.

Right, that was intentional for my own simplicity's sake...
The crucial difference is the missing translation into a solrj  
response by specifying the vl.response parameter. This was  
intended to make template creation more handy, cause the  
queryResponse is much nicer to navigate.


If this conversion is to specific and shouldn't be in  
VelocityResponseWriter, would it be a problem to create a subclass  
inside contrib/javascript?
I need to understand it a bit more, but no subclass is necessary...  
we'll patch it into contrib/velocity's VrW like you had it before.


The key part is to pass a parameter like vl.response=QueryResponse,  
so the transformation works like that:


object =  
request.getCore().getResourceLoader().newInstance(className,  
client.solrj.response.);


solrResponse.setResponse(new  
EmbeddedSolrServer(request.getCore()).getParsedResponse(request,  
response));


This was done based on api changes from Ryan to generalize the  
second setResponse part. In the etmplate , there is access to the  
created response, as well as to the rawResponse.


I'll try to add the least necessary stuff to current vrw, test ist  
against solrjs and post a patch to jira.


matthias


   Erik




Re: Build Solr to run SolrJS

2008-11-17 Thread Ryan McKinley


On Nov 17, 2008, at 1:35 PM, Erik Hatcher wrote:

Can you elaborate on the use case for why you need the raw response  
like that?


I vaguely get it, but want to really understand the need here.

I'm weary of the EmbeddedSolrServer usage in there, as I want to  
distill the VrW stuff to be able to use SolrJ's API rather than  
assume embedded Solr.  This way VrW can be separated from core Solr  
to another tier and template on remote Solr responses.  Thoughts  
on how this feature might play out in that scenario?


Essentially the function:
 solrResponse.setResponse(new  
EmbeddedSolrServer(request.getCore()).getParsedResponse(request,  
response));

makes the results look as if they came from solrj.

If the results did come from solrj, we would not need to set the  
solrResponse -- they would already be set and in the proper form.


ryan



Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser

Ryan McKinley schrieb:


On Nov 17, 2008, at 1:35 PM, Erik Hatcher wrote:

Can you elaborate on the use case for why you need the raw response 
like that?


I vaguely get it, but want to really understand the need here.

I'm weary of the EmbeddedSolrServer usage in there, as I want to 
distill the VrW stuff to be able to use SolrJ's API rather than assume 
embedded Solr.  This way VrW can be separated from core Solr to 
another tier and template on remote Solr responses.  Thoughts on how 
this feature might play out in that scenario?


After we add the SolrQueryResponse to the templates first, we realized that some 
convenience methods for iterating the result docs, accessing facets etc. would 
be fine.


The idea was to reuse the existing wrappers (eg. QueryResponse). It makes it 
much nicer to create templates, because velocity is made to just render things, 
so code using docsets etc. directly may be very overloaded.




Essentially the function:
 solrResponse.setResponse(new 
EmbeddedSolrServer(request.getCore()).getParsedResponse(request, 
response));

makes the results look as if they came from solrj.

If the results did come from solrj, we would not need to set the 
solrResponse -- they would already be set and in the proper form.


ryan





Fwd: Software Announcement: LuSql: Database to Lucene indexing

2008-11-17 Thread Matthew Runo

Hello -

I wanted to forward this on, since I thought that people here might be  
able to use this to build indexes. So long as the lucene version in  
LuSQL matches the version in Solr, it would work fine for indexing -  
yea?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

Begin forwarded message:


From: Glen Newton [EMAIL PROTECTED]
Date: November 17, 2008 4:32:18 AM PST
To: [EMAIL PROTECTED]
Subject: Software Announcement: LuSql: Database to Lucene indexing
Reply-To: [EMAIL PROTECTED]

LuSql is a simple but powerful tool for building Lucene indexes from
relational databases. It is a command-line Java application for the
construction of a Lucene index from an arbitrary SQL query of a
JDBC-accessible SQL database. It allows a user to control a number of
parameters, including the SQL query to use, individual
indexing/storage/term-vector nature of fields, analyzer, stop word
list, and other tuning parameters. In its default mode it uses
threading to take advantage of multiple cores.

LuSql can handle complex queries, allows for additional per record
sub-queries, and has a plug-in architecture for arbitrary Lucene
document manipulation. Its only dependencies are three Apache Commons
libraries, the Lucene core itself, and a JDBC driver.

LuSql has been extensively tested, including a large 6+ million
full-text  metadata journal article document collection, producing an
86GB Lucene index in ~13 hours.

http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

Glen Newton

--

-

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher


On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote:
After we add the SolrQueryResponse to the templates first, we  
realized that some convenience methods for iterating the result  
docs, accessing facets etc. would be fine.


The idea was to reuse the existing wrappers (eg. QueryResponse). It  
makes it much nicer to create templates, because velocity is made to  
just render things, so code using docsets etc. directly may be very  
overloaded.


Right, and well understood.  What I've put out there is a barebones  
skeleton, and there are lots of TODOs for these conveniences.  I want  
to get it using SolrJ's API for request/response rather than the more  
internal stuff we're using now.


Erik



Re: Build Solr to run SolrJS

2008-11-17 Thread Ryan McKinley


On Nov 17, 2008, at 2:59 PM, Erik Hatcher wrote:



On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote:
After we add the SolrQueryResponse to the templates first, we  
realized that some convenience methods for iterating the result  
docs, accessing facets etc. would be fine.


The idea was to reuse the existing wrappers (eg. QueryResponse). It  
makes it much nicer to create templates, because velocity is made  
to just render things, so code using docsets etc. directly may be  
very overloaded.


Right, and well understood.  What I've put out there is a barebones  
skeleton, and there are lots of TODOs for these conveniences.  I  
want to get it using SolrJ's API for request/response rather than  
the more internal stuff we're using now.




I think the 'internal' stuff mode is there because Yonik expressed  
concern about requiring the conversion of DocList to SolrDocumentList  
-- since Matthias had already done the work to access docs out of the  
DocList, I figured we should leave it in, even if I can't imagine  
using it.  (someone may be worried about the performance win of not  
serializing DocList)


ryan


Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-17 Thread Erik Hatcher
Yeah, it'd work, though not only does the version of Lucene need to  
match, but the field indexing/storage attributes need to jive as well  
- and that is the trickier part of the equation.


But yeah, LuSQL looks slick!

Erik


On Nov 17, 2008, at 2:17 PM, Matthew Runo wrote:


Hello -

I wanted to forward this on, since I thought that people here might  
be able to use this to build indexes. So long as the lucene version  
in LuSQL matches the version in Solr, it would work fine for  
indexing - yea?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

Begin forwarded message:


From: Glen Newton [EMAIL PROTECTED]
Date: November 17, 2008 4:32:18 AM PST
To: [EMAIL PROTECTED]
Subject: Software Announcement: LuSql: Database to Lucene indexing
Reply-To: [EMAIL PROTECTED]

LuSql is a simple but powerful tool for building Lucene indexes from
relational databases. It is a command-line Java application for the
construction of a Lucene index from an arbitrary SQL query of a
JDBC-accessible SQL database. It allows a user to control a number of
parameters, including the SQL query to use, individual
indexing/storage/term-vector nature of fields, analyzer, stop word
list, and other tuning parameters. In its default mode it uses
threading to take advantage of multiple cores.

LuSql can handle complex queries, allows for additional per record
sub-queries, and has a plug-in architecture for arbitrary Lucene
document manipulation. Its only dependencies are three Apache Commons
libraries, the Lucene core itself, and a JDBC driver.

LuSql has been extensively tested, including a large 6+ million
full-text  metadata journal article document collection, producing  
an

86GB Lucene index in ~13 hours.

http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

Glen Newton

--

-

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser

Erik Hatcher schrieb:


On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote:
After we add the SolrQueryResponse to the templates first, we realized 
that some convenience methods for iterating the result docs, accessing 
facets etc. would be fine.


The idea was to reuse the existing wrappers (eg. QueryResponse). It 
makes it much nicer to create templates, because velocity is made to 
just render things, so code using docsets etc. directly may be very 
overloaded.


Right, and well understood.  What I've put out there is a barebones 
skeleton, and there are lots of TODOs for these conveniences.  I want to 
get it using SolrJ's API for request/response rather than the more 
internal stuff we're using now.


Got your point.

I just added a new patch at https://issues.apache.org/jira/browse/SOLR-620 that 
makes solrjs run again. It includes:


- support for response wrapping
- support for json wrap
- adding the v. prefix to all request parameters for consistency reasons.

I'm aware that some parts of this features may be achieved in a nicer way. As 
you know the SolrJ code better, thanks for your thoughts, I'll try also to dig 
into the SolrJ side to get a better picture. See this patch as a feature list 
I need for solrjs.


matthias



Erik





Re: Solr security

2008-11-17 Thread Ian Holsman

There was a patch by Sean Timm you should investigate as well.

It limited a query so it would take a maximum of X seconds to execute, 
and would just return the rows it had found in that time.



Feak, Todd wrote:

I see value in this in the form of protecting the client from itself.

For example, our Solr isn't accessible from the Internet. It's all
behind firewalls. But, the client applications can make programming
mistakes. I would love the ability to lock them down to a certain number
of rows, just in case someone typos and puts in 1000 instead of 100, or
the like.

Admittedly, testing and QA should catch these things, but sometimes it's
nice to put in a few safeguards to stop the obvious mistakes from
occurring.

-Todd Feak

-Original Message-
From: Matthias Epheser [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 17, 2008 9:07 AM

To: solr-user@lucene.apache.org
Subject: Re: Solr security

Ryan McKinley schrieb:
  however I have found that in any site where
  

stability/load and uptime are a serious concern, this is better

handled 
  
in a tier in front of java -- typically the loadbalancer / haproxy / 
whatever -- and managed by people more cautious then me.



Full ack. What do you think about the only solr related thing left,
the 
paramter filtering/blocking (eg. rows1000). Is this suitable to do it
in a 
Filter delivered by solr? Of course as an optional alternative.


  

ryan







  




Re: Solr security

2008-11-17 Thread Sean Timm
http://issues.apache.org/jira/browse/SOLR-527 (An XML commit only 
request handler) is pertinent to this discussion as well.


-Sean

Ian Holsman wrote:

There was a patch by Sean Timm you should investigate as well.

It limited a query so it would take a maximum of X seconds to execute, 
and would just return the rows it had found in that time.



Feak, Todd wrote:

I see value in this in the form of protecting the client from itself.

For example, our Solr isn't accessible from the Internet. It's all
behind firewalls. But, the client applications can make programming
mistakes. I would love the ability to lock them down to a certain number
of rows, just in case someone typos and puts in 1000 instead of 100, or
the like.

Admittedly, testing and QA should catch these things, but sometimes it's
nice to put in a few safeguards to stop the obvious mistakes from
occurring.

-Todd Feak

-Original Message-
From: Matthias Epheser [mailto:[EMAIL PROTECTED] Sent: Monday, 
November 17, 2008 9:07 AM

To: solr-user@lucene.apache.org
Subject: Re: Solr security

Ryan McKinley schrieb:
  however I have found that in any site where
 

stability/load and uptime are a serious concern, this is better

handled  
in a tier in front of java -- typically the loadbalancer / haproxy / 
whatever -- and managed by people more cautious then me.



Full ack. What do you think about the only solr related thing left,
the paramter filtering/blocking (eg. rows1000). Is this suitable to 
do it

in a Filter delivered by solr? Of course as an optional alternative.

 

ryan







  




RE: Solr security

2008-11-17 Thread Lance Norskog
About that read-only switch for Solr: one of the basic HTTP design
guidelines is that GET should only return values, and should never change
the state of the data. All changes to the data should be made with POST. (In
REST style guidelines, PUT, POST, and DELETE.) This prevents you from
passing around URLs in email that can destroy the index.  The first role of
security is to prevent accidents.

I would suggest two layers of read-only switch. 1) Open the Lucene index
in read-only mode. 2) Allow only search servers to accept GET requests.

Lance



Re: Solr security

2008-11-17 Thread Sean Timm
I believe the Solr replication scripts require POSTing a commit to read 
in the new index--so at least limited POST capability is required in 
most scenarios.


-Sean

Lance Norskog wrote:

About that read-only switch for Solr: one of the basic HTTP design
guidelines is that GET should only return values, and should never change
the state of the data. All changes to the data should be made with POST. (In
REST style guidelines, PUT, POST, and DELETE.) This prevents you from
passing around URLs in email that can destroy the index.  The first role of
security is to prevent accidents.

I would suggest two layers of read-only switch. 1) Open the Lucene index
in read-only mode. 2) Allow only search servers to accept GET requests.

Lance

  


Updating schema.xml without deleting index?

2008-11-17 Thread Jeff Lerman
I've tried searching for this answer all over but have found no results
thus far.  I am trying to add a new field to my schema.xml with a
default value of 0.  I have a ton of data indexed right now and it would
be very hard to retrieve all of the original sources to rebuild my
index.  So my question is...is there any way to send a command to SOLR
that tells it to re-index everything it has and include the new field I
added?

 

Thanks,

 

Jeff



Re: Solr security

2008-11-17 Thread Ian Holsman

if thats the case putting apache in front of it would be handy.

something like
limit  POST
order deny,allow
deny from all
allow from 192.168.0.1
/limit

might be helpful.

Sean Timm wrote:
I believe the Solr replication scripts require POSTing a commit to 
read in the new index--so at least limited POST capability is required 
in most scenarios.


-Sean

Lance Norskog wrote:

About that read-only switch for Solr: one of the basic HTTP design
guidelines is that GET should only return values, and should never 
change
the state of the data. All changes to the data should be made with 
POST. (In

REST style guidelines, PUT, POST, and DELETE.) This prevents you from
passing around URLs in email that can destroy the index.  The first 
role of

security is to prevent accidents.

I would suggest two layers of read-only switch. 1) Open the Lucene 
index

in read-only mode. 2) Allow only search servers to accept GET requests.

Lance

  






Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser

Erik Hatcher schrieb:


On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote:
Matthias and Ryan - let's get SolrJS integrated into 
contrib/velocity.  Any objections/reservations?


As SolrJS may be used without velocity at all (using eg. 
ClientSideWidgets), is it possible to put it into contrib/javascript 
and create a dependency to contrib/velocity for ServerSideWidgets?


Sure, contrib/javascript sounds perfect.

If that's ok, I'll have a look at the directory structure and the 
current ant build.xml to make them fit into the common solr structure 
and build.


Awesome, thanks!


Just uploaded solrjs.zip to https://issues.apache.org/jira/browse/SOLR-868. It 
is intended to be extracted in contrib/javascript and supports the following ant 
targets:


* ant dist - creates a single js file and a jar that holds velocity templates.
* ant docs - creates js docs. test in browser: doc/index.html
* ant example-init - (depends ant dist on solr root) copies the current built 
of solr.war and solr-velocity.jar to example/testsolr/..

* ant example-start - starts the testsolr server on port 8983
* ant example-import - imports 3000 test data rows (requires a started 
testserver)




Erik





RE: abt Multicore

2008-11-17 Thread Nguyen, Joe
 
Any suggestions?
-Original Message-
From: Nguyen, Joe 
Sent: Monday, November 17, 2008 9:40 Joe
To: 'solr-user@lucene.apache.org'
Subject: RE: abt Multicore

Are all the documents in the same search space?  That is, for a given
query, could any of the 10MM docs be returned?

If so, I don't think you need to worry about multicore.  You may however
need to put part of the index on various machines:
http://wiki.apache.org/solr/DistributedSearch 

I also try to make decision whether going with muticore or distributed
search. My concern is as follow:

Does that mean having a single big schema with lot of fields?
Distributed Search requires that each document must have a unique key.
In this case, the unique key cannot be a primary key of a table.

I wonder how Solr performs in this case (distributed search vs.
multicore) 1.  Distributed Search
a.  All documents are in a single index.  Indexing a single document
would lock the index and affect query performance?  
b.  If multi machines are used, Solr will need to query each machine
and merge the result.  This also could impact performance. 
C.  Support MoreLikeThis query given a document id.
2.  Multicore
a.  Each table will be associated with a single core.  Indexing a
single document would lock only a specific core index.  Thus,quering
documents on other cores won't be impacted.
B.  Querying documents across multicore must be handle by the
caller.
C.  Can't support MoreLikeThis query since document id from one core
has no meaning on other cores.

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED]
Sent: Monday, November 17, 2008 6:09 Joe
To: solr-user@lucene.apache.org
Subject: Re: abt Multicore

Are all the documents in the same search space?  That is, for a given
query, could any of the 10MM docs be returned?

If so, I don't think you need to worry about multicore.  You may however
need to put part of the index on various machines:
http://wiki.apache.org/solr/DistributedSearch

ryan


On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote:

 Hi,

 I have an app running on weblogic and oracle. Oracle DB is quite huge;

 say some 10 millions of records. I need to integrate Solr for this and

 I am planning to use multicore. How can multicore feature can be at 
 the best?



 -Raghu




Re: Regex Transformer Error

2008-11-17 Thread Ahmed Hammad
Hi All,

Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it
will be stored in the index and needed to be removed while searching. In my
case the HTML tags has no need at all. So I created HTMLStripTransformer for
the DIH to remove the HTML tags and save space on the index. I have used the
HTML parser included with Lucene ( org.apache.lucene.demo.html). It is well
performing and worked with me (while working with Lucene before moving to
Solr)

What do you think? Does it worth contribution?

My best wishes,

Regards,
Ahmed

On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance [EMAIL PROTECTED] wrote:

 There is a nice HTML stripper inside Solr.
 solr.HTMLStripStandardTokenizerFactory

 -Original Message-
 From: Ahmed Hammad [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 05, 2008 10:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Regex Transformer Error

 Hi,

 It works with the attribute regex=lt;(.|\n)*?gt;

 Sorry for the disturbance.

 Regards,

 ahmd


 On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad [EMAIL PROTECTED] wrote:

  Hi,
 
  I am using Solr 1.3 data import handler. One of my table fields has
  html tags, I want to strip it of the field text. So obviously I need
  the Regex Transformer.
 
  I added transformer=RegexTransformer attribute to my entity and a
  new field with:
 
  field sourceColName=content column=content regex=English
  replaceWith=X/
 
  Every thing works fine. The text is replace without any problem. The
  provlem happend with my regular experession to strip html tags. So I
  use regex=(.|\n)*?. Of course the charecters '' and '' are not
  allowed in XML. I tried the following regex=lt;(.|\n)*?gt; and
  regex=#3C;(.|\n)*?#3E; but I get the following error:
 
  The value of attribute regex associated with an element type field

  must not contain the '' character. at
  com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
  Source) ...
 
  The full stack trace is following:
 
  *FATAL: Could not create importer. DataImporter config invalid
  org.apache.solr.common.SolrException: FATAL: Could not create
 importer.
  DataImporter config invalid at
  org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
  Handler.java:114)
  at
  org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
  (DataImportHandler.java:206)
  at
  org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
  rBase.java:131) at
  org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
  org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
  java:303)
  at
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
  .java:232)
  at
  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
  cationFilterChain.java:235)
  at
  org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
  lterChain.java:206)
  at
  org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
  lve.java:233)
  at
  org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
  lve.java:191)
  at
  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
  va:128)
  at
  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
  va:102)
  at
  org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
  e.java:109)
  at
  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
  :286)
  at
  org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
  .java:857)
  at
  org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
  cess(Http11AprProtocol.java:565) at
  org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
  9) at java.lang.Thread.run(Unknown Source) Caused by:
  org.apache.solr.handler.dataimport.DataImportHandlerException:
  Exception occurred while initializing context Processing Document # at
  org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
  orter.java:176)
  at
  org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.ja
  va:93)
  at
  org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
  Handler.java:106) ... 17 more Caused by:
  org.xml.sax.SAXParseException: The value of attribute regex
  associated with an element type field must not contain the ''
  character. at
  com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
  Source) at
  com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
  own
  Source) at
  org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
  orter.java:166)
  ... 19 more *
 
  *description* *The server encountered an internal error (FATAL: Could
  not create importer. DataImporter config invalid
  org.apache.solr.common.SolrException: FATAL: Could not create
 importer.
  DataImporter config invalid at
  org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
  Handler.java:114)
  at
  

Re: Solr security

2008-11-17 Thread Erik Hatcher
trouble is, you can also GET /solr/update, even all on the URL, no  
request body...


   http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true 



Solr is a bad RESTafarian.

Getting warmer!

Erik


On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote:


if thats the case putting apache in front of it would be handy.

something like
limit  POST
order deny,allow
deny from all
allow from 192.168.0.1
/limit

might be helpful.

Sean Timm wrote:
I believe the Solr replication scripts require POSTing a commit to  
read in the new index--so at least limited POST capability is  
required in most scenarios.


-Sean

Lance Norskog wrote:

About that read-only switch for Solr: one of the basic HTTP design
guidelines is that GET should only return values, and should never  
change
the state of the data. All changes to the data should be made with  
POST. (In
REST style guidelines, PUT, POST, and DELETE.) This prevents you  
from
passing around URLs in email that can destroy the index.  The  
first role of

security is to prevent accidents.

I would suggest two layers of read-only switch. 1) Open the  
Lucene index
in read-only mode. 2) Allow only search servers to accept GET  
requests.


Lance








Re: Solr security

2008-11-17 Thread Ryan McKinley


On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote:

trouble is, you can also GET /solr/update, even all on the URL, no  
request body...


  http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true 



Solr is a bad RESTafarian.



but with Ian's options in the apache config, this would not work...   
rather it would only work if stream.body was a POST







Getting warmer!

Erik


On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote:


if thats the case putting apache in front of it would be handy.

something like
limit  POST
order deny,allow
deny from all
allow from 192.168.0.1
/limit

might be helpful.

Sean Timm wrote:
I believe the Solr replication scripts require POSTing a commit to  
read in the new index--so at least limited POST capability is  
required in most scenarios.


-Sean

Lance Norskog wrote:
About that read-only switch for Solr: one of the basic HTTP  
design
guidelines is that GET should only return values, and should  
never change
the state of the data. All changes to the data should be made  
with POST. (In
REST style guidelines, PUT, POST, and DELETE.) This prevents you  
from
passing around URLs in email that can destroy the index.  The  
first role of

security is to prevent accidents.

I would suggest two layers of read-only switch. 1) Open the  
Lucene index
in read-only mode. 2) Allow only search servers to accept GET  
requests.


Lance










RE: Updating schema.xml without deleting index?

2008-11-17 Thread Nguyen, Joe
Don't know whether this would work... Just speculate :-)

A.  You'll need to create a new schema with the new field or you could
use dynamic field in your current schema (assume you already config the
default value to 0).
B.  Add a couple of new documents
C.  Run optimize script.  Since optimize will consolidate all segments
into a single segment.  At the end, you'll have a single segment which
include the new field. 

Would that work?

-Original Message-
From: Jeff Lerman [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 17, 2008 12:45 Joe
To: solr-user@lucene.apache.org
Subject: Updating schema.xml without deleting index?

I've tried searching for this answer all over but have found no results
thus far.  I am trying to add a new field to my schema.xml with a
default value of 0.  I have a ton of data indexed right now and it would
be very hard to retrieve all of the original sources to rebuild my
index.  So my question is...is there any way to send a command to SOLR
that tells it to re-index everything it has and include the new field I
added?

 

Thanks,

 

Jeff



Re: Solr security

2008-11-17 Thread Ian Holsman

Ryan McKinley wrote:


On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote:

trouble is, you can also GET /solr/update, even all on the URL, no 
request body...


  
http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true 



Solr is a bad RESTafarian.



but with Ian's options in the apache config, this would not work...  
rather it would only work if stream.body was a POST


location /solr/update

order deny,allow
deny from all
allow from 192.168.0.1
/location
?
or perhaps locationmatch.. but you get the picture.






Getting warmer!

Erik


On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote:


if thats the case putting apache in front of it would be handy.

something like
limit  POST
order deny,allow
deny from all
allow from 192.168.0.1
/limit

might be helpful.

Sean Timm wrote:
I believe the Solr replication scripts require POSTing a commit to 
read in the new index--so at least limited POST capability is 
required in most scenarios.


-Sean

Lance Norskog wrote:

About that read-only switch for Solr: one of the basic HTTP design
guidelines is that GET should only return values, and should never 
change
the state of the data. All changes to the data should be made with 
POST. (In

REST style guidelines, PUT, POST, and DELETE.) This prevents you from
passing around URLs in email that can destroy the index.  The 
first role of

security is to prevent accidents.

I would suggest two layers of read-only switch. 1) Open the 
Lucene index
in read-only mode. 2) Allow only search servers to accept GET 
requests.


Lance













Re: sole 1.3: bug in phps response writer

2008-11-17 Thread Otis Gospodnetic
Hi Alok,

I don't think it's a known issue and 2. a) sounds like the best and most 
appreciated approach! :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Alok Dhir [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, November 17, 2008 12:36:25 PM
Subject: sole 1.3: bug in phps response writer

Distributed queries:

curl 
'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=php'

curl 
'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=xml

curl 
'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=json''

All work fine, providing identical results in their respective formats (note 
the change in the wt param).

curl 
'http://devxen0:8983/solr/core0/select?shards=search3:8983/solr/core0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=phps'

fails with:

java.lang.IllegalArgumentException: Map size must not be negative
at 
org.apache.solr.request.PHPSerializedWriter.writeMapOpener(PHPSerializedResponseWriter.java:195)
at 
org.apache.solr.request.JSONWriter.writeSolrDocument(JSONResponseWriter.java:392)
at 
org.apache.solr.request.JSONWriter.writeSolrDocumentList(JSONResponseWriter.java:547)
at 
org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:147)
at 
org.apache.solr.request.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:150)
at 
org.apache.solr.request.PHPSerializedWriter.writeNamedList(PHPSerializedResponseWriter.java:71)
at 
org.apache.solr.request.PHPSerializedWriter.writeResponse(PHPSerializedResponseWriter.java:66)
at 
org.apache.solr.request.PHPSerializedResponseWriter.write(PHPSerializedResponseWriter.java:47)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Questions:

1) Is this known?  I didn't see it in the issue treacker.

2) What's the better course of action: a) download source, fix, submit patch, 
wait for new relase; b) drop phps and use json instead?

Thanks

Query Response Doc Score - Int Value

2008-11-17 Thread Derek Springer
Hello,
I am currently performing a query to a Solr index I've set up and I'm trying
to 1) sort on the score and 2) sort on the date_created (a custom field I've
added). The sort command looks like: sort=score+desc,created_date+desc.

The gist of it is that I will 1) first return the most relevant results then
2) within those results, return the most recent results. However, the issue
I have is that the score is a decimal value that is far to precise (e.g.
2.3518934 vs 2.2173865) and will therefore never collide and trigger the
secondary sort on the date.

The question I am asking is if anyone knows a way to produce a score that is
more coarse, or if it is possible to force the score to return as an
integer. That way I could have the results collide on the score more often
and therefore sort on the date as well.

Thanks!
-Derek


Re: sole 1.3: bug in phps response writer

2008-11-17 Thread James liu
i find url not same as the others
-- 
regards
j.L


Re: Using properties from core configuration in data-config.xml

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
nope . It is not possible as of now. the placeholders are not aware of
the core properties.
Is it possible to pass the values as request params? Request
parameters can be accessed .

You can raise an issue and we can address this separately


On Mon, Nov 17, 2008 at 7:57 PM,  [EMAIL PROTECTED] wrote:
 Hello,

 is it possible to use properties from core configuration in data-config.xml?
 I want to define the baseDir for DataImportHandler.


 I tried the following configuration:


 *** solr.xml ***

 solr persistent=false
  cores adminPath='null'
core name=core0 instanceDir=/opt/solr/cores/core0
 property name=solrDataDir value=/opt/solr/cores/core0/data /
  property name=xmlDataDir value=/home/xml/core0 /
/core
...
  /cores
 /solr




 *** data-config.xml ***

 dataConfig
  dataSource type=FileDataSource /
  document
   entity name=xmlFile
 processor=FileListEntityProcessor
 baseDir=${xmlDataDir}
 fileName=id-.*\.xml
 rootEntity=false
 dataSource=null
 entity name=data
  pk=id
  url=${xmlFile.fileAbsolutePath}
  processor=XPathEntityProcessor
 ...
 /dataConfig



 But this is the result:

 ...
 Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter 
 doFullImport
 INFO: Starting Full Import
 Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute
 INFO: [posts-politics] webapp=/solr path=/dataimport 
 params={optimize=truecommit=truecommand=full-importqt=/dataimportwt=javabinversion=2.2}
  status=0 QTime=66
 Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute
 INFO: [posts-politics] webapp=/solr path=/dataimport 
 params={qt=/dataimportwt=javabinversion=2.2} status=0 QTime=0
 Nov 17, 2008 1:50:08 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll
 INFO: [posts-politics] REMOVING ALL DOCUMENTS FROM INDEX
 Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter 
 doFullImport
 SEVERE: Full Import failed
 org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' 
 should point to a directory Processing Document # 1
  at 
 org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:81)
 ...




 I tried also to configure all dataimport settings in solrconfig.xml, but I 
 don't know how to do this exactly. Among other things, I tried this format:


 *** solrconfig.xml ***

 ...
 requestHandler name=/dataimport 
 class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
  lst name=datasource
   str name=typeFileDataSource/str
   lst name=document
lst name=entity
 str name=namexmlFile/str
 str name=processorFileListEntityProcessor/str
 str name=baseDir${xmlDataDir}/str
 str name=fileNameid-.*\.xml/str
 str name=rootEntityfalse/str
 str name=dataSourcenull/str
 lst name=entity
   str name=namedata/str
   str name=pkid/str
   str name=url${xmlFile.fileAbsolutePath}/str
 ...
 /requestHandler
 ...



 But all my tests (with different dataimport formats in solrconfig.xml) 
 failed:


 ...
 INFO: Reusing parent classloader
 Nov 17, 2008 2:18:14 PM org.apache.solr.common.SolrException log
 SEVERE: Error in solrconfig.xml:org.apache.solr.common.SolrException: No 
 system property or default value specified for xmlFile.fileAbsolutePath
at 
 org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311)
at 
 org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:264)
 ...



 Thanks again for your excellent support!

 Gisto

 --
 Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
 Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer




-- 
--Noble Paul


Re: Regex Transformer Error

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Nov 18, 2008 at 2:49 AM, Ahmed Hammad [EMAIL PROTECTED] wrote:
 Hi All,

 Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it
 will be stored in the index and needed to be removed while searching. In my
 case the HTML tags has no need at all. So I created HTMLStripTransformer for
 the DIH to remove the HTML tags and save space on the index. I have used the
 HTML parser included with Lucene ( org.apache.lucene.demo.html). It is well
 performing and worked with me (while working with Lucene before moving to
 Solr)

 What do you think? Does it worth contribution?
Yes. You can contribute this new transformer as an enhancement .

 My best wishes,

 Regards,
 Ahmed

 On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance [EMAIL PROTECTED] wrote:

 There is a nice HTML stripper inside Solr.
 solr.HTMLStripStandardTokenizerFactory

 -Original Message-
 From: Ahmed Hammad [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 05, 2008 10:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Regex Transformer Error

 Hi,

 It works with the attribute regex=lt;(.|\n)*?gt;

 Sorry for the disturbance.

 Regards,

 ahmd


 On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad [EMAIL PROTECTED] wrote:

  Hi,
 
  I am using Solr 1.3 data import handler. One of my table fields has
  html tags, I want to strip it of the field text. So obviously I need
  the Regex Transformer.
 
  I added transformer=RegexTransformer attribute to my entity and a
  new field with:
 
  field sourceColName=content column=content regex=English
  replaceWith=X/
 
  Every thing works fine. The text is replace without any problem. The
  provlem happend with my regular experession to strip html tags. So I
  use regex=(.|\n)*?. Of course the charecters '' and '' are not
  allowed in XML. I tried the following regex=lt;(.|\n)*?gt; and
  regex=#3C;(.|\n)*?#3E; but I get the following error:
 
  The value of attribute regex associated with an element type field

  must not contain the '' character. at
  com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
  Source) ...
 
  The full stack trace is following:
 
  *FATAL: Could not create importer. DataImporter config invalid
  org.apache.solr.common.SolrException: FATAL: Could not create
 importer.
  DataImporter config invalid at
  org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
  Handler.java:114)
  at
  org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
  (DataImportHandler.java:206)
  at
  org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
  rBase.java:131) at
  org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
  org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
  java:303)
  at
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
  .java:232)
  at
  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
  cationFilterChain.java:235)
  at
  org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
  lterChain.java:206)
  at
  org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
  lve.java:233)
  at
  org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
  lve.java:191)
  at
  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
  va:128)
  at
  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
  va:102)
  at
  org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
  e.java:109)
  at
  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
  :286)
  at
  org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
  .java:857)
  at
  org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
  cess(Http11AprProtocol.java:565) at
  org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
  9) at java.lang.Thread.run(Unknown Source) Caused by:
  org.apache.solr.handler.dataimport.DataImportHandlerException:
  Exception occurred while initializing context Processing Document # at
  org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
  orter.java:176)
  at
  org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.ja
  va:93)
  at
  org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
  Handler.java:106) ... 17 more Caused by:
  org.xml.sax.SAXParseException: The value of attribute regex
  associated with an element type field must not contain the ''
  character. at
  com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
  Source) at
  com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
  own
  Source) at
  org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
  orter.java:166)
  ... 19 more *
 
  *description* *The server encountered an internal error (FATAL: Could
  not create importer. DataImporter config invalid
  org.apache.solr.common.SolrException: FATAL: Could not create
 importer.
  

Re: Query Response Doc Score - Int Value

2008-11-17 Thread Yonik Seeley
A function query is the likely candidate - no such quantization
function exists, but it would be relatively easy to write one.

-Yonik

On Mon, Nov 17, 2008 at 8:17 PM, Derek Springer [EMAIL PROTECTED] wrote:
 Hello,
 I am currently performing a query to a Solr index I've set up and I'm trying
 to 1) sort on the score and 2) sort on the date_created (a custom field I've
 added). The sort command looks like: sort=score+desc,created_date+desc.

 The gist of it is that I will 1) first return the most relevant results then
 2) within those results, return the most recent results. However, the issue
 I have is that the score is a decimal value that is far to precise (e.g.
 2.3518934 vs 2.2173865) and will therefore never collide and trigger the
 secondary sort on the date.

 The question I am asking is if anyone knows a way to produce a score that is
 more coarse, or if it is possible to force the score to return as an
 integer. That way I could have the results collide on the score more often
 and therefore sort on the date as well.

 Thanks!
 -Derek



Re: abt Multicore

2008-11-17 Thread Shalin Shekhar Mangar
Some high level thoughts:

On Mon, Nov 17, 2008 at 11:10 PM, Nguyen, Joe [EMAIL PROTECTED]wrote:

 Are all the documents in the same search space?  That is, for a given
 query, could any of the 10MM docs be returned?

 If so, I don't think you need to worry about multicore.  You may however
 need to put part of the index on various machines:
 http://wiki.apache.org/solr/DistributedSearch 

 I also try to make decision whether going with muticore or distributed
 search. My concern is as follow:

 Does that mean having a single big schema with lot of fields?


Yes and that's the use-case behind multi-valued fields. De-normalizing and
avoiding joins helps to scale.


 Distributed Search requires that each document must have a unique key.
 In this case, the unique key cannot be a primary key of a table.

 I wonder how Solr performs in this case (distributed search vs.
 multicore)
 1.  Distributed Search
a.  All documents are in a single index.  Indexing a single document
 would lock the index and affect query performance?


Indexing does not lock out searchers. Solr is designed to be queried
regardless of indexing. However, depending on your machine's performance and
your configuration, you may see slow queries during commits/auto-warming.

Also, in distributed search, you have different Solr instances handling
disjoint sets of data. Indexing on one instance does not affect the rest.


b.  If multi machines are used, Solr will need to query each machine
 and merge the result.  This also could impact performance.


Yes, but in most scenarios where distributed search is required, it is just
not possible to use a single box for the while index. If you set out to
write similar kind of querying for multi-cores, it will be difficult to
optimize it as well as Solr's implementation.



C.  Support MoreLikeThis query given a document id.


MoreLikeThis is not implemented for distributed environments (yet).



 2.  Multicore
a.  Each table will be associated with a single core.  Indexing a
 single document would lock only a specific core index.  Thus,quering
 documents on other cores won't be impacted.


With multi-core, all cores are on a single box, you may see slow queries on
other cores too (again, it depends on your box's strength).



B.  Querying documents across multicore must be handle by the
 caller.


That is not a use-case for which Lucene/Solr were designed. Joins are
discouraged most of the times.



C.  Can't support MoreLikeThis query since document id from one core
 has no meaning on other cores.


MoreLikeThis makes no sense in this case because the document structure
(schema) is totally different.




 -Original Message-
 From: Ryan McKinley [mailto:[EMAIL PROTECTED]
 Sent: Monday, November 17, 2008 6:09 Joe
 To: solr-user@lucene.apache.org
 Subject: Re: abt Multicore

 Are all the documents in the same search space?  That is, for a given
 query, could any of the 10MM docs be returned?

 If so, I don't think you need to worry about multicore.  You may however
 need to put part of the index on various machines:
 http://wiki.apache.org/solr/DistributedSearch

 ryan


 On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote:

  Hi,
 
  I have an app running on weblogic and oracle. Oracle DB is quite huge;

  say some 10 millions of records. I need to integrate Solr for this and

  I am planning to use multicore. How can multicore feature can be at
  the best?
 
 
 
  -Raghu
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: Using properties from core configuration in data-config.xml

2008-11-17 Thread Shalin Shekhar Mangar
There may be one way to do this.

Add your property in the invariant section of solrconfig's DataImportHandler
element. For example, add this section:

lst name=invariants
  str name=xmlDataDir${xmlDataDir}/str
/lst

Then you can use it as ${dataimporter.request.xmlDataDir} in your
data-config to access this.

On Tue, Nov 18, 2008 at 9:17 AM, Noble Paul നോബിള്‍ नोब्ळ् 
[EMAIL PROTECTED] wrote:

 nope . It is not possible as of now. the placeholders are not aware of
 the core properties.
 Is it possible to pass the values as request params? Request
 parameters can be accessed .

 You can raise an issue and we can address this separately


 On Mon, Nov 17, 2008 at 7:57 PM,  [EMAIL PROTECTED] wrote:
  Hello,
 
  is it possible to use properties from core configuration in
 data-config.xml?
  I want to define the baseDir for DataImportHandler.
 
 
  I tried the following configuration:
 
 
  *** solr.xml ***
 
  solr persistent=false
   cores adminPath='null'
 core name=core0 instanceDir=/opt/solr/cores/core0
  property name=solrDataDir value=/opt/solr/cores/core0/data /
   property name=xmlDataDir value=/home/xml/core0 /
 /core
 ...
   /cores
  /solr
 
 
 
 
  *** data-config.xml ***
 
  dataConfig
   dataSource type=FileDataSource /
   document
entity name=xmlFile
  processor=FileListEntityProcessor
  baseDir=${xmlDataDir}
  fileName=id-.*\.xml
  rootEntity=false
  dataSource=null
  entity name=data
   pk=id
   url=${xmlFile.fileAbsolutePath}
   processor=XPathEntityProcessor
  ...
  /dataConfig
 
 
 
  But this is the result:
 
  ...
  Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
  INFO: Starting Full Import
  Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute
  INFO: [posts-politics] webapp=/solr path=/dataimport
 params={optimize=truecommit=truecommand=full-importqt=/dataimportwt=javabinversion=2.2}
 status=0 QTime=66
  Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute
  INFO: [posts-politics] webapp=/solr path=/dataimport
 params={qt=/dataimportwt=javabinversion=2.2} status=0 QTime=0
  Nov 17, 2008 1:50:08 PM org.apache.solr.update.DirectUpdateHandler2
 deleteAll
  INFO: [posts-politics] REMOVING ALL DOCUMENTS FROM INDEX
  Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
  SEVERE: Full Import failed
  org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir'
 should point to a directory Processing Document # 1
   at
 org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:81)
  ...
 
 
 
 
  I tried also to configure all dataimport settings in solrconfig.xml, but
 I don't know how to do this exactly. Among other things, I tried this
 format:
 
 
  *** solrconfig.xml ***
 
  ...
  requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
   lst name=defaults
   lst name=datasource
str name=typeFileDataSource/str
lst name=document
 lst name=entity
  str name=namexmlFile/str
  str name=processorFileListEntityProcessor/str
  str name=baseDir${xmlDataDir}/str
  str name=fileNameid-.*\.xml/str
  str name=rootEntityfalse/str
  str name=dataSourcenull/str
  lst name=entity
str name=namedata/str
str name=pkid/str
str name=url${xmlFile.fileAbsolutePath}/str
  ...
  /requestHandler
  ...
 
 
 
  But all my tests (with different dataimport formats in solrconfig.xml)
 failed:
 
 
  ...
  INFO: Reusing parent classloader
  Nov 17, 2008 2:18:14 PM org.apache.solr.common.SolrException log
  SEVERE: Error in solrconfig.xml:org.apache.solr.common.SolrException: No
 system property or default value specified for xmlFile.fileAbsolutePath
 at
 org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311)
 at
 org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:264)
  ...
 
 
 
  Thanks again for your excellent support!
 
  Gisto
 
  --
  Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
  Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
 



 --
 --Noble Paul




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr security

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
If the user is using the new java Solr replication then he can get rid
of the /update and /update/csv handlers altogether. So the slaves are
completely read-only
--Noble



On Tue, Nov 18, 2008 at 2:14 AM, Sean Timm [EMAIL PROTECTED] wrote:
 I believe the Solr replication scripts require POSTing a commit to read in
 the new index--so at least limited POST capability is required in most
 scenarios.

 -Sean

 Lance Norskog wrote:

 About that read-only switch for Solr: one of the basic HTTP design
 guidelines is that GET should only return values, and should never change
 the state of the data. All changes to the data should be made with POST.
 (In
 REST style guidelines, PUT, POST, and DELETE.) This prevents you from
 passing around URLs in email that can destroy the index.  The first role
 of
 security is to prevent accidents.

 I would suggest two layers of read-only switch. 1) Open the Lucene index
 in read-only mode. 2) Allow only search servers to accept GET requests.

 Lance






-- 
--Noble Paul


Re: Solr security

2008-11-17 Thread Chris Hostetter

:  Full ack. What do you think about the only solr related thing left, the
:  paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a
:  Filter delivered by solr? Of course as an optional alternative.

: As eric mentioned earlier, this could be done in a QueryComponent -- the
: prepare part could just make sure the query parameters are all within
: reasonable ranges.  This seems like something reasonable to add to solr.

i don't even see it requiring a new component -- the existing 
QueryComponent could treat this similar to the way the DismaxQParser deals 
with q and q.alt ... add two new params: start.max and rows.max that 
default to some very large values; QueryComponent respects start  rows 
only as long as they don't exceed the corrisponding max; peoples that want 
ot lock down their ports can make them invariants for the handlers that 
are exposed.


-Hoss



Re: Query Response Doc Score - Int Value

2008-11-17 Thread Derek Springer
Thanks for the heads up. Can anyone point me to (or provide me with) an
example of writing a function query?

-Derek

On Mon, Nov 17, 2008 at 8:17 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 A function query is the likely candidate - no such quantization
 function exists, but it would be relatively easy to write one.

 -Yonik

 On Mon, Nov 17, 2008 at 8:17 PM, Derek Springer [EMAIL PROTECTED] wrote:
  Hello,
  I am currently performing a query to a Solr index I've set up and I'm
 trying
  to 1) sort on the score and 2) sort on the date_created (a custom field
 I've
  added). The sort command looks like: sort=score+desc,created_date+desc.
 
  The gist of it is that I will 1) first return the most relevant results
 then
  2) within those results, return the most recent results. However, the
 issue
  I have is that the score is a decimal value that is far to precise (e.g.
  2.3518934 vs 2.2173865) and will therefore never collide and trigger
 the
  secondary sort on the date.
 
  The question I am asking is if anyone knows a way to produce a score that
 is
  more coarse, or if it is possible to force the score to return as an
  integer. That way I could have the results collide on the score more
 often
  and therefore sort on the date as well.
 
  Thanks!
  -Derek
 




-- 
Derek B. Springer
Software Developer
Mahalo.com, Inc.
902 Colorado Ave.,
Santa Monica, CA 90401
[EMAIL PROTECTED]


Use SOLR like the MySQL LIKE

2008-11-17 Thread Carsten L

Hello.

The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.

The problem:
I would like to be able to search in it, but it should be like the MySQL
LIKE.

So when a user enters the search term: carsten, then the query looks like:
name:(carsten) OR name:(carsten*) OR email:(carsten) OR
email:(carsten*) OR userid:(carsten) OR userid:(carsten*)

Then it should match:
carsten l
carsten larsen
Carsten Larsen
Carsten
CARSTEN
etc.

And when the user enters the term: carsten l the query looks like:
name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)

Then it should match:
carsten l
carsten larsen
Carsten Larsen

Or written to the MySQL syntax: ... WHERE `name` LIKE 'carsten%'  OR
`email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'...

I know that I need to use the solr.LowerCaseTokenizerFactory on my name
and email field, to ensure case insentitive behavior.
The problem seems to be the wildcards and the whitespaces.
-- 
View this message in context: 
http://www.nabble.com/Use-SOLR-like-the-%22MySQL-LIKE%22-tp20554732p20554732.html
Sent from the Solr - User mailing list archive at Nabble.com.