[jira] Commented: (SOLR-221) faceting memory and performance improvement

2007-05-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495844
 ] 

Yonik Seeley commented on SOLR-221:
---

So for configuration, how about a SolrParam of
facet.minDfFilterCache  (can anyone think of a better name?), probably 
per-field.
We can defer more complex configuration in order to fit this into Solr 1.2, as 
long as we don't think this single parameter is a mistake.

> faceting memory and performance improvement
> ---
>
> Key: SOLR-221
> URL: https://issues.apache.org/jira/browse/SOLR-221
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
> Attachments: facet.patch
>
>
> 1) compare minimum count currently needed to the term df and avoid 
> unnecessary intersection count
> 2) set a minimum term df in order to use the filterCache, otherwise iterate 
> over TermDocs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-230) make post.jar support better args for using tutorial

2007-05-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-230:
-

Assignee: Hoss Man

baring objection, i'll plan on committing this later this week.

> make post.jar support better args for using tutorial
> 
>
> Key: SOLR-230
> URL: https://issues.apache.org/jira/browse/SOLR-230
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Hoss Man
> Assigned To: Hoss Man
> Attachments: SOLR-230.patch
>
>
> SOLR-86 create post.jar which eliminated the need for post.sh ... but as 
> noticed in 
> SOLR-164 there are still some cases in the tutorial that require direct use 
> of curl (deleting) and there are some nice things about post.sh that post.jar 
> doesn't support (defaulting the URL)
> this issue is to tackle some of the ideas Bertrand and I posted as a comment 
> in SOLR-86 after it was resolved
> Bertrand Delacretaz [19/Feb/07 12:35 AM] ...
> Considering the tutorial examples 
> (http://lucene.apache.org/solr/tutorial.html), it'd be useful to allow this 
> to POST its standard input, or the contents of a command-line parameter: ...
> Hoss Man [19/Feb/07 11:50 AM]
> yeah ... i think we should hardcode http://localhost:8983/solr/update with a 
> possible override by system prop, then add either a command line switch other 
> another system prop indicating to use the command line as filenames or as raw 
> data, and another op for stdin.
> java -jar -Ddata=files post.jar *.xml
> java -jar post.jar *.xml ... data=files being the default
> echo "name:DDR" | java -jar -Ddata=stdin 
> post.jar
> cat *.xml | java -jar -Ddata=stdin post.jar
> java -jar -Ddata=args post.jar "name:DDR"
> java -jar -Durl=http://localhost:8983/solr/update post.jar *.xml 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-230) make post.jar support better args for using tutorial

2007-05-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-230:
--

Attachment: SOLR-230.patch

patch that tackles all of these changes ... modifies SimplePostTool a well as 
the tutorial.

note two small differences between what i proposed and what I implemented...
  1) "cat *.xml | post -Ddata=stdin -jar post.jar" does not work because when 
reading from stdin we have 1 and only one stream to post, and the examples 
files themselves contain the  blocks.  "cat *.xml | post -Ddata=stdin -jar 
post.jar" does work however
  2) i added a "commit" system prop and defaulted it to "yes" ... this is 
needed because when deleting in the tutorial it wants to show off the pending 
dleetes and the fact that the doc is still there until you commit.

for what it's worth there is now also simple support for  "-help" option, but i 
don't know if we should advertise it ... if anyone is using post.jar beyond 
what is described in theetutorial, they should relaly look at the code itself.

> make post.jar support better args for using tutorial
> 
>
> Key: SOLR-230
> URL: https://issues.apache.org/jira/browse/SOLR-230
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Hoss Man
> Attachments: SOLR-230.patch
>
>
> SOLR-86 create post.jar which eliminated the need for post.sh ... but as 
> noticed in 
> SOLR-164 there are still some cases in the tutorial that require direct use 
> of curl (deleting) and there are some nice things about post.sh that post.jar 
> doesn't support (defaulting the URL)
> this issue is to tackle some of the ideas Bertrand and I posted as a comment 
> in SOLR-86 after it was resolved
> Bertrand Delacretaz [19/Feb/07 12:35 AM] ...
> Considering the tutorial examples 
> (http://lucene.apache.org/solr/tutorial.html), it'd be useful to allow this 
> to POST its standard input, or the contents of a command-line parameter: ...
> Hoss Man [19/Feb/07 11:50 AM]
> yeah ... i think we should hardcode http://localhost:8983/solr/update with a 
> possible override by system prop, then add either a command line switch other 
> another system prop indicating to use the command line as filenames or as raw 
> data, and another op for stdin.
> java -jar -Ddata=files post.jar *.xml
> java -jar post.jar *.xml ... data=files being the default
> echo "name:DDR" | java -jar -Ddata=stdin 
> post.jar
> cat *.xml | java -jar -Ddata=stdin post.jar
> java -jar -Ddata=args post.jar "name:DDR"
> java -jar -Durl=http://localhost:8983/solr/update post.jar *.xml 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: update bundled lucene version?

2007-05-14 Thread Yonik Seeley

On 5/14/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

On 14-May-07, at 1:01 PM, Yonik Seeley wrote:

> I've audited the Lucene changes since 2.1, and don't see anything
> problematic, so perhaps we should upgrade to the latest lucene trunk
> to get:
> - file descriptor usage reduction (only one descriptor for all
> norms now)
> - leading + trailing wildcard fix
> - performance improvements (mainly lazy prox skipping)
> - QueryParser parsing + escaping fixes
> - minor sloppy phrase query fixes

Sounds good to me.  I recall for some reason that the payload patch
had been applied, but that either might be wrong or not a concern.


It has been applied, but it's of minimal concern since there isn't
currently a plan to
utilize it in Solr, and if it's not used the file formats don't change.

-Yonik


Re: [jira] Commented: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-14 Thread Chris Hostetter

: To understand it right you would like to build the site with forrest and
: in the build appears the version number and the name of the dis (ant
: property ${fullnamever}) of the tutorial.

actually, i was just wondering about generic property replacement when
converting from the raw source docs to the HTML/PDF docs using a proeprty
file ... i was assuming this file could be generated by the build.xml --
so even if ant isn't what kicks off forrest (we currently don't have it
setup that way) forrest would load the key/value mappings from a property file
generated by the last ant build.

: One idea was for me to use a filter with the copy task that e.g.
: @fullnamever@ will be substitute with ${fullnamever}. The problem is
: that would not be substituted then on the live website.

the live site is currently upated manually by developers, anytime a change
is commited to the site dir ... so we can make sure the site is updated
anytime it needs to be if there is a simple process.  but in general i'm
not excited by the prospect of solving this problem using an ant copy
filter because that doesn't help when deves are previewing how the site
will look using forrest, as you say ...

: One could replace http://wiki.apache.org/solr/Website_Update_HOWTO step
: 2 of "Website update steps" with a target that is doing the filtering
: for you. Then in "forrest run" you would find @fullnamever@ but after
: building the site and using the copy target with filtering true you have
: the variable substituted. The problem is that the nightly builds would

that's why i was hoping forrest had a variable substitution mechanism
built into it that could just read from some file that we have ant
generate.

: need to build as well the documentation with forrest. Letting forrest do
: the substitution and import forrest targets into the solr build.xml is a
: similar approach but then you have an even bigger dependency on forrest.

as i say, i don't think the build.xml needs to depend on Solr (the nightly
builds don't currently regen the site) ... we just need the build.xml to
produce a property file or something like it (easy to do) and we need
forrest to read that file and fill in variables with the values it finds.

is there something like that in forrest?



-Hoss



Re: update bundled lucene version?

2007-05-14 Thread Mike Klaas


On 14-May-07, at 1:01 PM, Yonik Seeley wrote:


I've audited the Lucene changes since 2.1, and don't see anything
problematic, so perhaps we should upgrade to the latest lucene trunk
to get:
- file descriptor usage reduction (only one descriptor for all  
norms now)

- leading + trailing wildcard fix
- performance improvements (mainly lazy prox skipping)
- QueryParser parsing + escaping fixes
- minor sloppy phrase query fixes


Sounds good to me.  I recall for some reason that the payload patch  
had been applied, but that either might be wrong or not a concern.


-Mike


Re: [jira] Commented: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-14 Thread Thorsten Scherler
On Mon, 2007-05-14 at 11:20 -0700, Hoss Man (JIRA) wrote:
> [ 
> https://issues.apache.org/jira/browse/SOLR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495706
>  ] 
> 
> Hoss Man commented on SOLR-238:
> ---
> 
> Thorsten ... thanks for the prod on this issue.  One thing that makes this 
> tricky is that the tutorial (and the entire website) are bundled with every 
> release ... that's why we keep the site up to date with the trunk, so that 
> people can review the docs as time goes on, but when a release is cut people 
> using that release should refer to the docs that come with it.
> 
> I'm not very knowledgeable in forest, do you (or anyone else watching this 
> issue) know if there is an easy way to do variable substitution into the 
> generated docs when they are build using property files (or something like it)
> 
> Then the docs could always contain the current Solr spec version number when 
> the tutorial is regenerated (for official releases, the spec version number 
> looks like 1.1, 1.2, etc... for nightly builds it looks like 
> 1.1.2007.05.11.10.10.53 -- the last official version number followed by the 
> current datetime)

Well the quickest way certainly is changing the skinconf.xml by hand.
However that will not be possible in the use-cases you describe (for
nightly builds).

For this case you would need something more sophisticated. 

To understand it right you would like to build the site with forrest and
in the build appears the version number and the name of the dis (ant
property ${fullnamever}) of the tutorial.

In the solr build.xml we define:

  


  

...
  

One idea was for me to use a filter with the copy task that e.g.
@fullnamever@ will be substitute with ${fullnamever}. The problem is
that would not be substituted then on the live website.

One could replace http://wiki.apache.org/solr/Website_Update_HOWTO step
2 of "Website update steps" with a target that is doing the filtering
for you. Then in "forrest run" you would find @fullnamever@ but after
building the site and using the copy target with filtering true you have
the variable substituted. The problem is that the nightly builds would
need to build as well the documentation with forrest. Letting forrest do
the substitution and import forrest targets into the solr build.xml is a
similar approach but then you have an even bigger dependency on forrest.

I need to think about it but maybe meanwhile somebody on forrest-dev
(which I cc) has an idea.

salu2
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-14 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-239:
--

Attachment: IndexSchemaStream.patch

patch with test cases attached.  i also had to change raw-schema.jsp to be a 
redirect to get-files.jsp however it wasn't clear that raw-schema.jsp was in 
use anymore.

> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
> Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-14 Thread Will Johnson (JIRA)
Read IndexSchema from InputStream instead of Config file


 Key: SOLR-239
 URL: https://issues.apache.org/jira/browse/SOLR-239
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.2
 Environment: all
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.2


Soon to follow patch adds a constructor to IndexSchema to allow them to be 
created directly from InputStreams.  The overall logic for the Core's use of 
the IndexSchema creation/use does not change however this allows java clients 
like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
parsed, the client can inspect an index's capabilities which is useful for 
building generic search UI's.  ie provide a drop down list of fields to 
search/sort by.  



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



update bundled lucene version?

2007-05-14 Thread Yonik Seeley

I've audited the Lucene changes since 2.1, and don't see anything
problematic, so perhaps we should upgrade to the latest lucene trunk
to get:
- file descriptor usage reduction (only one descriptor for all norms now)
- leading + trailing wildcard fix
- performance improvements (mainly lazy prox skipping)
- QueryParser parsing + escaping fixes
- minor sloppy phrase query fixes

-Yonik


[jira] Commented: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-14 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495706
 ] 

Hoss Man commented on SOLR-238:
---

Thorsten ... thanks for the prod on this issue.  One thing that makes this 
tricky is that the tutorial (and the entire website) are bundled with every 
release ... that's why we keep the site up to date with the trunk, so that 
people can review the docs as time goes on, but when a release is cut people 
using that release should refer to the docs that come with it.

I'm not very knowledgeable in forest, do you (or anyone else watching this 
issue) know if there is an easy way to do variable substitution into the 
generated docs when they are build using property files (or something like it)

Then the docs could always contain the current Solr spec version number when 
the tutorial is regenerated (for official releases, the spec version number 
looks like 1.1, 1.2, etc... for nightly builds it looks like 
1.1.2007.05.11.10.10.53 -- the last official version number followed by the 
current datetime)



> [Patch] The tutorial on our website is against trunk which causes confusion 
> by user
> ---
>
> Key: SOLR-238
> URL: https://issues.apache.org/jira/browse/SOLR-238
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Thorsten Scherler
> Attachments: SOLR-238.diff, SOLR-238.png
>
>
> The patch will add a note to the tutorial page with the following headsup:
> "This is documentation for the development version (TRUNK). Some instructions 
> may only work if you are working against svn head."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-236) Field collapsing

2007-05-14 Thread Emmanuel Keller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Keller updated SOLR-236:
-

Description: 
This patch include a new feature called "Field collapsing".

"Used in order to collapse a group of results with similar value for a given 
field to a single entry in the result set. Site collapsing is a special case of 
this, where all results for a given web site is collapsed into one or two 
entries in the result set, typically with an associated "more documents from 
this site" link. See also Duplicate detection."
http://www.fastsearch.com/glossary.aspx?m=48&amid=299

The implementation add 4 new query parameters (SolrParams):
"collapse" set to true to enable collapsing.
"collapse.field" to choose the field used to group results
"collapse.type" normal (default value) or adjacent
"collapse.max" to select how many continuous results are allowed before 
collapsing

TODO (in progress):
- More documentation (on source code)
- Test cases


  was:
This patch include a new feature called "Field collapsing".

"Used in order to collapse a group of results with similar value for a given 
field to a single entry in the result set. Site collapsing is a special case of 
this, where all results for a given web site is collapsed into one or two 
entries in the result set, typically with an associated "more documents from 
this site" link. See also Duplicate detection."
http://www.fastsearch.com/glossary.aspx?m=48&amid=299

The implementation add 3 new query parameters (SolrParams):
"collapse" set to true to enable collapsing.
"collapse.field" to choose the field used to group results
"collapse.max" to select how many continuous results are allowed before 
collapsing

TODO (in progress):
- More documentation (on source code)
- Test cases



> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.2
>Reporter: Emmanuel Keller
> Attachments: collapse_field.patch, collapse_field.patch, 
> field_collapsing.patch, field_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 4 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-236) Field collapsing

2007-05-14 Thread Emmanuel Keller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Keller updated SOLR-236:
-

Attachment: field_collapsing.patch

Corrects a bug on the previous version when using a value greater than 1 as 
collapse.max parameter.

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.2
>Reporter: Emmanuel Keller
> Attachments: collapse_field.patch, collapse_field.patch, 
> field_collapsing.patch, field_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-14 Thread Thorsten Scherler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thorsten Scherler updated SOLR-238:
---

Attachment: SOLR-238.png

screenshot 
Find window title changed and two new note boxes.

> [Patch] The tutorial on our website is against trunk which causes confusion 
> by user
> ---
>
> Key: SOLR-238
> URL: https://issues.apache.org/jira/browse/SOLR-238
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Thorsten Scherler
> Attachments: SOLR-238.diff, SOLR-238.png
>
>
> The patch will add a note to the tutorial page with the following headsup:
> "This is documentation for the development version (TRUNK). Some instructions 
> may only work if you are working against svn head."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-14 Thread Thorsten Scherler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thorsten Scherler updated SOLR-238:
---

Attachment: SOLR-238.diff

Patch of the forrest skinconf.xml

> [Patch] The tutorial on our website is against trunk which causes confusion 
> by user
> ---
>
> Key: SOLR-238
> URL: https://issues.apache.org/jira/browse/SOLR-238
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Thorsten Scherler
> Attachments: SOLR-238.diff
>
>
> The patch will add a note to the tutorial page with the following headsup:
> "This is documentation for the development version (TRUNK). Some instructions 
> may only work if you are working against svn head."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-14 Thread Thorsten Scherler (JIRA)
[Patch] The tutorial on our website is against trunk which causes confusion by 
user
---

 Key: SOLR-238
 URL: https://issues.apache.org/jira/browse/SOLR-238
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Thorsten Scherler


The patch will add a note to the tutorial page with the following headsup:
"This is documentation for the development version (TRUNK). Some instructions 
may only work if you are working against svn head."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.