RE: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-20 Thread Uwe Schindler
Congrats Jan!

 

Uwe

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

 <https://www.thetaphi.de> https://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Anshum Gupta  
Sent: Thursday, February 18, 2021 7:55 PM
To: Lucene Dev ; solr-user@lucene.apache.org
Subject: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

 

Hi everyone,

 

I’d like to inform everyone that the newly formed Apache Solr PMC nominated and 
elected Jan Høydahl for the position of the Solr PMC Chair and Vice President. 
This decision was approved by the board in its February 2021 meeting.

 

Congratulations Jan! 

 

-- 

Anshum Gupta



[SECURITY] CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext)

2018-07-04 Thread Uwe Schindler
CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload
(exchange rate provider config / enum field config / TIKA parsecontext)

Severity: High

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 6.0.0 to 6.6.4
Solr 7.0.0 to 7.3.1

Description:
The details of this vulnerability were reported by mail to the Apache
security mailing list.
This vulnerability relates to an XML external entity expansion (XXE) in Solr
config files (currency.xml, enumsConfig.xml referred from schema.xml,
TIKA parsecontext config file). In addition, Xinclude functionality provided
in these config files is also affected in a similar way. The vulnerability can
be used as XXE using file/ftp/http protocols in order to read arbitrary
local files from the Solr server or the internal network. The manipulated
files can be uploaded as configsets using Solr's API, allowing to exploit
that vulnerability. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.5 or Solr 7.4.0 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases only allow external entities and Xincludes that
refer to local files / zookeeper resources below the Solr instance directory
(using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in
mind, that external entities and XInclude are explicitly supported to better
structure config files in large installations. Before Solr 6 this was no
problem, as config files were not accessible through the APIs.

If users are unable to upgrade to Solr 6.6.5 or Solr 7.4.0 then they are
advised to make sure that Solr instances are only used locally without access
to public internet, so the vulnerability cannot be exploited. In addition,
reverse proxies should be guarded to not allow end users to reach the
configset APIs. Please refer to [2] on how to correctly secure Solr servers.

Solr 5.x and earlier are not affected by this vulnerability; those versions
do not allow to upload configsets via the API. Nevertheless, users should
upgrade those versions as soon as possible, because there may be other ways
to inject config files through file upload functionality of the old web
interface. Those versions are no longer maintained, so no deep analysis was
done.

Credit:
Yuyang Xiao, Ishan Chattopadhyaya

References:
[1] https://issues.apache.org/jira/browse/SOLR-12450
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




[SECURITY] CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload

2018-05-21 Thread Uwe Schindler
CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload

Severity: High

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 6.0.0 to 6.6.3
Solr 7.0.0 to 7.3.0

Description:
The details of this vulnerability were reported internally by one of Apache
Solr's committers.
This vulnerability relates to an XML external entity expansion (XXE) in Solr
config files (solrconfig.xml, schema.xml, managed-schema). In addition,
Xinclude functionality provided in these config files is also affected in a
similar way. The vulnerability can be used as XXE using file/ftp/http
protocols in order to read arbitrary local files from the Solr server or the
internal network. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.4 or Solr 7.3.1 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases only allow external entities and Xincludes that
refer to local files / zookeeper resources below the Solr instance directory
(using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in
mind, that external entities and XInclude are explicitly supported to better
structure config files in large installations. Before Solr 6 this was no
problem, as config files were not accessible through the APIs.

If users are unable to upgrade to Solr 6.6.4 or Solr 7.3.1 then they are
advised to make sure that Solr instances are only used locally without access
to public internet, so the vulnerability cannot be exploited. In addition,
reverse proxies should be guarded to not allow end users to reach the
configset APIs. Please refer to [2] on how to correctly secure Solr servers.

Solr 5.x and earlier are not affected by this vulnerability; those versions
do not allow to upload configsets via the API. Nevertheless, users should
upgrade those versions as soon as possible, because there may be other ways
to inject config files through file upload functionality of the old web
interface. Those versions are no longer maintained, so no deep analysis was
done.

Credit:
Ananthesh, Ishan Chattopadhyaya

References:
[1] https://issues.apache.org/jira/browse/SOLR-12316
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




[SECURITY] CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request parameter

2018-04-08 Thread Uwe Schindler
CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request 
parameter

Severity: Major

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 1.2 to 6.6.2
Solr 7.0.0 to 7.2.1

Description:
The details of this vulnerability were reported to the Apache Security mailing 
list. 

This vulnerability relates to an XML external entity expansion (XXE) in the
`&dataConfig=` parameter of Solr's DataImportHandler. It can be
used as XXE using file/ftp/http protocols in order to read arbitrary local
files from the Solr server or the internal network. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.3 or Solr 7.3.0 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases disable external entities in anonymous XML files
passed through this request parameter. 

If users are unable to upgrade to Solr 6.6.3 or Solr 7.3.0 then they are
advised to disable data import handler in their solrconfig.xml file and
restart their Solr instances. Alternatively, if Solr instances are only used
locally without access to public internet, the vulnerability cannot be used
directly, so it may not be required to update, and instead reverse proxies or
Solr client applications should be guarded to not allow end users to inject
`dataConfig` request parameters. Please refer to [2] on how to correctly
secure Solr servers.

Credit:
麦 香浓郁

References:
[1] https://issues.apache.org/jira/browse/SOLR-11971
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




FOSS Backstage Micro Summit on Monday in Berlin

2017-11-17 Thread Uwe Schindler
Hi,

It's already a bit late, but all people who are visiting Germany next week and 
want to do a short trip to Berlin: There are still slots free on the FOSS 
Backstage Micro Summit. It is a mini conference conference on everything 
related to governance, collaboration, legal and economics within the scope of 
FOSS. The main event will take place as part of berlinbuzzwords 2018. We have a 
lot of speakers invited - also from ASF!

https://www.foss-backstage.de/

Program:
https://www.foss-backstage.de/news/micro-summit-program-online-now

I hope to see you there,
Uwe

-----
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Hi,
TIKA OCR is definitely working automatically with Solr 5.x.

It is just important to install TesseractOCR on path (which is a native tool 
that does the actual work). On Ubuntu Linux, this should be quite simple 
("apt-get install tesseract-ocr" or like that). You may also need to ainstall 
additional language for better results.

Unless you are on a Turkish localized machine (which causes a bug in the JDK on 
spawning external processes) and the native tools are installed, it should work 
OOB, no configuration needed. Please also check log files.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Monday, April 27, 2015 4:27 PM
> To: u...@tika.apache.org
> Cc: trung...@anlab.vn; solr-user@lucene.apache.org
> Subject: FW: TIKA OCR not working
> 
> Trung,
> 
> I haven't experimented with our OCR parser yet, but this should give a good
> start: https://wiki.apache.org/tika/TikaOCR .
> 
> Have you installed tesseract?
> 
> Tika colleagues,
>   Any other tips?  What else has to be configured and how?
> 
> -Original Message-
> From: trung.ht [mailto:trung...@anlab.vn]
> Sent: Friday, April 24, 2015 11:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: TIKA OCR not working
> 
> HI everyone,
> 
> Does anyone have the answer for this problem :)?
> 
> 
> I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika 1.7,
> > but it looks like it does not work. Does anyone know that TIKA OCR
> > works automatically with Solr or I have to change some settings?
> >
> >>
> Trung.
> 
> 
> > It's not clear if OCR would happen automatically in Solr Cell, or if
> >> changes to Solr would be needed.
> >>
> >> For Tika OCR info, see:
> >>
> >> https://issues.apache.org/jira/browse/TIKA-93
> >> https://wiki.apache.org/tika/TikaOCR
> >>
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> On Thu, Apr 23, 2015 at 9:14 AM, Alexandre Rafalovitch <
> >> arafa...@gmail.com>
> >> wrote:
> >>
> >> > I think OCR is in Tika 1.8, so might be in Solr 5.?. But I haven't
> >> > seen
> >> it
> >> > in use yet.
> >> >
> >> > Regards,
> >> > Alex
> >> > On 23 Apr 2015 10:24 pm, "Ahmet Arslan" 
> >> wrote:
> >> >
> >> > > Hi Trung,
> >> > >
> >> > > I didn't know about OCR capabilities of tika.
> >> > > Someone who is familiar with sold-cell can inform us whether this
> >> > > functionality is added to solr or not.
> >> > >
> >> > > Ahmet
> >> > >
> >> > >
> >> > >
> >> > > On Thursday, April 23, 2015 2:06 PM, trung.ht 
> >> wrote:
> >> > > Hi Ahmet,
> >> > >
> >> > > I used a png file, not a pdf file. From the document, I
> >> > > understand
> >> that
> >> > > solr will post the file to tika, and since tika 1.7, OCR is included.
> >> Is
> >> > > there something I misunderstood.
> >> > >
> >> > > Trung.
> >> > >
> >> > >
> >> > > On Thu, Apr 23, 2015 at 5:59 PM, Ahmet Arslan
> >>  >> > >
> >> > > wrote:
> >> > >
> >> > > > Hi Trung,
> >> > > >
> >> > > > solr-cell (tika) does not do OCR. It cannot exact text from
> >> > > > image
> >> based
> >> > > > pdfs.
> >> > > >
> >> > > > Ahmet
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Thursday, April 23, 2015 7:33 AM, trung.ht
> >> > > > 
> >> > wrote:
> >> > > >
> >> > > >
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > I want to use solr to index some scanned document, after
> >> > > > settings
> >> solr
> >> > > > document with a two field "content" and "filename", I tried to
> >> upload
> >> > the
> >> > > > attached file, but it seems that the content of the file is
> >> > > > only
> >> "\n \n
> >> > > > \n".

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Yes that is fixed.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
> Sent: Monday, April 27, 2015 4:29 PM
> To: u...@tika.apache.org
> Cc: trung...@anlab.vn; solr-user@lucene.apache.org
> Subject: Re: TIKA OCR not working
> 
> It should work out of the box in Solr as long as Tesseract is installed and on
> the class path. Solr had an issue with it since Tika sends 2 startDocument 
> calls,
> but I fixed that with Uwe and it was shipped in 4.10.4 and in 5.x I think?
> 
> ++
> 
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398) NASA Jet
> Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> 
> Adjunct Associate Professor, Computer Science Department University of
> Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: , "Timothy B." 
> Reply-To: "u...@tika.apache.org" 
> Date: Monday, April 27, 2015 at 10:26 AM
> To: "u...@tika.apache.org" 
> Cc: "trung...@anlab.vn" , "solr-
> u...@lucene.apache.org"
> 
> Subject: FW: TIKA OCR not working
> 
> >Trung,
> >
> >I haven't experimented with our OCR parser yet, but this should give a
> >good start: https://wiki.apache.org/tika/TikaOCR .
> >
> >Have you installed tesseract?
> >
> >Tika colleagues,
> >  Any other tips?  What else has to be configured and how?
> >
> >-Original Message-
> >From: trung.ht [mailto:trung...@anlab.vn]
> >Sent: Friday, April 24, 2015 11:22 PM
> >To: solr-user@lucene.apache.org
> >Subject: Re: TIKA OCR not working
> >
> >HI everyone,
> >
> >Does anyone have the answer for this problem :)?
> >
> >
> >I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika
> >1.7,
> >> but it looks like it does not work. Does anyone know that TIKA OCR
> >> works automatically with Solr or I have to change some settings?
> >>
> >>>
> >Trung.
> >
> >
> >> It's not clear if OCR would happen automatically in Solr Cell, or if
> >>> changes to Solr would be needed.
> >>>
> >>> For Tika OCR info, see:
> >>>
> >>> https://issues.apache.org/jira/browse/TIKA-93
> >>> https://wiki.apache.org/tika/TikaOCR
> >>>
> >>>
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> On Thu, Apr 23, 2015 at 9:14 AM, Alexandre Rafalovitch <
> >>> arafa...@gmail.com>
> >>> wrote:
> >>>
> >>> > I think OCR is in Tika 1.8, so might be in Solr 5.?. But I haven't
> >>>seen
> >>> it
> >>> > in use yet.
> >>> >
> >>> > Regards,
> >>> > Alex
> >>> > On 23 Apr 2015 10:24 pm, "Ahmet Arslan"
> >>> > 
> >>> wrote:
> >>> >
> >>> > > Hi Trung,
> >>> > >
> >>> > > I didn't know about OCR capabilities of tika.
> >>> > > Someone who is familiar with sold-cell can inform us whether
> >>> > > this functionality is added to solr or not.
> >>> > >
> >>> > > Ahmet
> >>> > >
> >>> > >
> >>> > >
> >>> > > On Thursday, April 23, 2015 2:06 PM, trung.ht
> >>> > > 
> >>> wrote:
> >>> > > Hi Ahmet,
> >>> > >
> >>> > > I used a png file, not a pdf file. From the document, I
> >>> > > understand
> >>> that
> >>> > > solr will post the file to tika, and since tika 1.7, OCR is
> >>>included.
> >>> Is
> >>> > > there something I misunderstood.
> >>> > >
> >>> > > Trung.
> >>> > >
> >>> > >
> >>> > > On Thu, Apr 23, 2015 at 5:59 PM, Ahmet Arslan
> >>>  >>> > >
> >>> > > wrote:
> >>> >

ApacheCon NA 2015 in Austin, Texas

2015-03-19 Thread Uwe Schindler
Dear Apache Lucene/Solr enthusiast,

In just a few weeks, we'll be holding ApacheCon in Austin, Texas, and we'd love 
to have you in attendance. You can save $300 on admission by registering NOW, 
since the early bird price ends on the 21st.

Register at http://s.apache.org/acna2015-reg

ApacheCon this year celebrates the 20th birthday of the Apache HTTP Server, and 
we'll have Brian Behlendorf, who started this whole thing, keynoting for us, 
and you'll have a chance to meet some of the original Apache Group, who will be 
there to celebrate with us.

We also have talks about Apache Lucene and Apache Solr in 7 tracks of great 
talks, as well as BOFs, the Apache BarCamp, project-specific hack events, and 
evening events where you can deepen your connection with the larger Apache 
community. See the full schedule at http://apacheconna2015.sched.org/

And if you have any questions, comments, or just want to hang out with us 
before and during the event, follow us on Twitter - @apachecon - or drop by 
#apachecon on the Freenode IRC network.

Hope to see you in Austin!

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: Reminder: FOSDEM 2015 - Open Source Search Dev Room

2014-12-03 Thread Uwe Schindler
Hello everyone,

We have extended the deadline for submissions to the FOSDEM 2015 Open Source 
Search Dev
Room to Monday, 9 December at 23:59 CET.

We are looking forward to your talk proposal!

Cheers,
Uwe

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/

> -Original Message-
> From: Uwe Schindler [mailto:uschind...@apache.org]
> Sent: Monday, November 24, 2014 9:33 AM
> To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr-
> u...@lucene.apache.org; gene...@lucene.apache.org
> Subject: Reminder: FOSDEM 2015 - Open Source Search Dev Room
> 
> Hi,
> 
> We host a Dev-Room about "Open Source Search" on this year's FOSDEM
> 2015 (https://fosdem.org/2015/), taking place on January 31th and February
> 1st, 2015, in Brussels, Belgium. There is still one more week to submit your
> talks, so hurry up and submit your talk early!
> 
> Here is the full CFP as posted a few weeks ago:
> 
> Search has evolved to be much more than simply full-text search. We now
> rely on “search engines” for a wide variety of functionality:
> search as navigation, search as analytics and backend for data visualization
> and sometimes, dare we say it, as a data store. The purpose of this dev room
> is to explore the new world of open source search engines: their enhanced
> functionality, new use cases, feature and architectural deep dives, and the
> position of search in relation to the wider set of software tools.
> 
> We welcome proposals from folks working with or on open source search
> engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.)
> or technologies that heavily depend upon search (e.g.
> NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in
> presentations on search algorithms, machine learning, real-world
> implementation/deployment stories and explorations of the future of
> search.
> 
> Talks should be 30-60 minutes in length, including time for Q&A.
> 
> You can submit your talks to us here:
> https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3
> 8G0OxSfp84A/viewform
> 
> Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We
> cannot guarantee we will have the opportunity to review submissions made
> after the deadline, so please submit early (and often)!
> 
> Should you have any questions, you can contact the Dev Room
> organizers: opensourcesearch-devr...@lists.fosdem.org
> 
> Cheers,
> LH on behalf of the Open Source Search Dev Room Program Committee*
> 
> * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten
> Curdt, Uwe Schindler
> 
> -
> Uwe Schindler
> uschind...@apache.org
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org



Reminder: FOSDEM 2015 - Open Source Search Dev Room

2014-11-24 Thread Uwe Schindler
Hi,

We host a Dev-Room about "Open Source Search" on this year's FOSDEM 2015 
(https://fosdem.org/2015/), taking place on January 31th and February 1st, 
2015, in Brussels, Belgium. There is still one more week to submit your talks, 
so hurry up and submit your talk early!

Here is the full CFP as posted a few weeks ago:

Search has evolved to be much more than simply full-text search. We now rely on 
“search engines” for a wide variety of functionality:
search as navigation, search as analytics and backend for data visualization 
and sometimes, dare we say it, as a data store. The purpose of this dev room is 
to explore the new world of open source search engines: their enhanced 
functionality, new use cases, feature and architectural deep dives, and the 
position of search in relation to the wider set of software tools.

We welcome proposals from folks working with or on open source search engines 
(e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or 
technologies that heavily depend upon search (e.g.
NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in 
presentations on search algorithms, machine learning, real-world 
implementation/deployment stories and explorations of the future of search.

Talks should be 30-60 minutes in length, including time for Q&A.

You can submit your talks to us here:
https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform

Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We 
cannot guarantee we will have the opportunity to review submissions made after 
the deadline, so please submit early (and often)!

Should you have any questions, you can contact the Dev Room
organizers: opensourcesearch-devr...@lists.fosdem.org

Cheers,
LH on behalf of the Open Source Search Dev Room Program Committee*

* Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, 
Uwe Schindler

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler
Hi,

forgot to mention:
FOSDEM 2015 takes place in Brussels on January 31th and February 1st, 2015. See 
also: https://fosdem.org/2015/

I hope to see you there!
Uwe

> -Original Message-
> From: Uwe Schindler [mailto:uschind...@apache.org]
> Sent: Monday, November 03, 2014 1:29 PM
> To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr-
> u...@lucene.apache.org; gene...@lucene.apache.org
> Subject: CFP: FOSDEM 2015 - Open Source Search Dev Room
> 
> ***Please forward this CFP to anyone who may be interested in
> participating.***
> 
> Hi,
> 
> Search has evolved to be much more than simply full-text search. We now
> rely on “search engines” for a wide variety of functionality:
> search as navigation, search as analytics and backend for data visualization
> and sometimes, dare we say it, as a data store. The purpose of this dev room
> is to explore the new world of open source search engines: their enhanced
> functionality, new use cases, feature and architectural deep dives, and the
> position of search in relation to the wider set of software tools.
> 
> We welcome proposals from folks working with or on open source search
> engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.)
> or technologies that heavily depend upon search (e.g.
> NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in
> presentations on search algorithms, machine learning, real-world
> implementation/deployment stories and explorations of the future of
> search.
> 
> Talks should be 30-60 minutes in length, including time for Q&A.
> 
> You can submit your talks to us here:
> https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3
> 8G0OxSfp84A/viewform
> 
> Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We
> cannot guarantee we will have the opportunity to review submissions made
> after the deadline, so please submit early (and often)!
> 
> Should you have any questions, you can contact the Dev Room
> organizers: opensourcesearch-devr...@lists.fosdem.org
> 
> Cheers,
> LH on behalf of the Open Source Search Dev Room Program Committee*
> 
> * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten
> Curdt, Uwe Schindler
> 
> -
> Uwe Schindler
> uschind...@apache.org
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



CFP: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler
***Please forward this CFP to anyone who may be interested in participating.***

Hi,

Search has evolved to be much more than simply full-text search. We now rely on 
“search engines” for a wide variety of functionality:
search as navigation, search as analytics and backend for data visualization 
and sometimes, dare we say it, as a data store. The purpose of this dev room is 
to explore the new world of open source search engines: their enhanced 
functionality, new use cases, feature and architectural deep dives, and the 
position of search in relation to the wider set of software tools.

We welcome proposals from folks working with or on open source search engines 
(e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or 
technologies that heavily depend upon search (e.g.
NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in 
presentations on search algorithms, machine learning, real-world 
implementation/deployment stories and explorations of the future of search.

Talks should be 30-60 minutes in length, including time for Q&A.

You can submit your talks to us here:
https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform

Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We 
cannot guarantee we will have the opportunity to review submissions made after 
the deadline, so please submit early (and often)!

Should you have any questions, you can contact the Dev Room
organizers: opensourcesearch-devr...@lists.fosdem.org

Cheers,
LH on behalf of the Open Source Search Dev Room Program Committee*

* Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, 
Uwe Schindler

-----
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




[ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations

2014-08-18 Thread Uwe Schindler
Hallo Apache Solr Users,

the Apache Lucene PMC wants to make the users of Solr aware of  the following 
issue:

Apache Solr versions 4.8.0, 4.8.1, 4.9.0 bundle Apache POI 3.10-beta2 with its 
binary release tarball. This version (and all previous ones) of Apache POI are 
vulnerable to the following issues:

= CVE-2014-3529: XML External Entity (XXE) problem in Apache POI's OpenXML 
parser =
Type: Information disclosure
Description: Apache POI uses Java's XML components to parse OpenXML files 
produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that 
accept such files from end-users are vulnerable to XML External Entity (XXE) 
attacks, which allows remote attackers to bypass security restrictions and read 
arbitrary files via a crafted OpenXML document that provides an XML external 
entity declaration in conjunction with an entity reference.

= CVE-2014-3574: XML Entity Expansion (XEE) problem in Apache POI's OpenXML 
parser =
Type: Denial of service
Description: Apache POI uses Java's XML components and Apache Xmlbeans to parse 
OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). 
Applications that accept such files from end-users are vulnerable to XML Entity 
Expansion (XEE) attacks ("XML bombs"), which allows remote hackers to consume 
large amounts of CPU resources.

The Apache POI PMC released a bugfix version (3.10.1) today.

Solr users are affected by these issues, if they enable the "Apache Solr 
Content Extraction Library (Solr Cell)" contrib module from the folder 
"contrib/extraction" of the release tarball.

Users of Apache Solr are strongly advised to keep the module disabled if they 
don't use it. Alternatively, users of Apache Solr 4.8.0, 4.8.1, or 4.9.0 can 
update the affected libraries by replacing the vulnerable JAR files in the 
distribution folder. Users of previous versions have to update their Solr 
release first, patching older versions is impossible.

To replace the vulnerable JAR files follow these steps:

- Download the Apache POI 3.10.1 binary release: 
http://poi.apache.org/download.html#POI-3.10.1
- Unzip the archive
- Delete the following files in your "solr-4.X.X/contrib/extraction/lib" 
folder: 
# poi-3.10-beta2.jar
# poi-ooxml-3.10-beta2.jar
# poi-ooxml-schemas-3.10-beta2.jar
# poi-scratchpad-3.10-beta2.jar
# xmlbeans-2.3.0.jar
- Copy the following files from the base folder of the Apache POI distribution 
to the "solr-4.X.X/contrib/extraction/lib" folder: 
# poi-3.10.1-20140818.jar
# poi-ooxml-3.10.1-20140818.jar
# poi-ooxml-schemas-3.10.1-20140818.jar
# poi-scratchpad-3.10.1-20140818.jar
- Copy "xmlbeans-2.6.0.jar" from POI's "ooxml-lib/" folder to the 
"solr-4.X.X/contrib/extraction/lib" folder.
- Verify that the "solr-4.X.X/contrib/extraction/lib" no longer contains any 
files with version number "3.10-beta2".
- Verify that the folder contains one xmlbeans JAR file with version 2.6.0.

If you just want to disable extraction of Microsoft Office documents, delete 
the files above and don't replace them. "Solr Cell" will automatically detect 
this and disable Microsoft Office document extraction.

Coming versions of Apache Solr will have the updated libraries bundled.

Happy Searching and Extracting,
The Apache Lucene Developers

PS: Thanks to Stefan Kopf, Mike Boufford, and Christian Schneider for reporting 
these issues!

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/





[ANNOUNCE] Apache Solr 4.8.0 released

2014-04-28 Thread Uwe Schindler
28 April 2014, Apache Solr™ 4.8.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.8.0

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.8.0 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.8.0 Release Highlights:

* Apache Solr now requires Java 7 or greater (recommended is
  Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions
  have known JVM bugs affecting Solr).

* Apache Solr is fully compatible with Java 8.

*  and  tags have been deprecated from schema.xml.
  There is no longer any reason to keep them in the schema file,
  they may be safely removed. This allows intermixing of ,
   and  definitions if desired.

* The new {!complexphrase} query parser supports wildcards, ORs etc.
  inside Phrase Queries. 

* New Collections API CLUSTERSTATUS action reports the status of
  collections, shards, and replicas, and also lists collection
  aliases and cluster properties.
 
* Added managed synonym and stopword filter factories, which enable
  synonym and stopword lists to be dynamically managed via REST API.

* JSON updates now support nested child documents, enabling {!child}
  and {!parent} block join queries. 

* Added ExpandComponent to expand results collapsed by the
  CollapsingQParserPlugin, as well as the parent/child relationship
  of nested child documents.

* Long-running Collections API tasks can now be executed
  asynchronously; the new REQUESTSTATUS action provides status.

* Added a hl.qparser parameter to allow you to define a query parser
  for hl.q highlight queries.

* In Solr single-node mode, cores can now be created using named
  configsets.

* New DocExpirationUpdateProcessorFactory supports computing an
  expiration date for documents from the "TTL" expression, as well as
  automatically deleting expired documents on a periodic basis. 

Solr 4.8.0 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Chair / Committer
Bremen, Germany
http://lucene.apache.org/




Attention: Lucene 4.8 and Solr 4.8 will require minimum Java 7

2014-03-12 Thread Uwe Schindler
Hi,

the Apache Lucene/Solr committers decided with a large majority on the vote to 
require Java 7 for the next minor release of Apache Lucene and Apache Solr 
(version 4.8)!
Support for Java 6 by Oracle  already ended more than a year ago and Java 8 is 
coming out in a few days.

The next release will also contain some improvements for Java 7:
- Better file handling (especially on Windows) in the directory 
implementations. Files can now be deleted on windows, although the index is 
still open - like it was always possible on Unix environments (delete on last 
close semantics).
- Speed improvements in sorting comparators: Sorting now uses Java 7's own 
comparators for integer and long sorts, which are highly optimized by the 
Hotspot VM..

If you want to stay up-to-date with Lucene and Solr, you should upgrade your 
infrastructure to Java 7. Please be aware that you must use at least use Java 
7u1.
The recommended version at the moment is Java 7u25. Later versions like 7u40, 
7u45,... have a bug causing index corrumption. Ideally use the Java 7u60 
prerelease, which has fixed this bug. Once 7u60 is out, this will be the 
recommended version.
In addition, there is no Oracle/BEA JRockit available for Java 7, use the 
official Oracle Java 7. JRockit was never working correctly with Lucene/Solr 
(causing index corrumption), so this should not be an issue for you. Please 
also review our list of JVM bugs: http://wiki.apache.org/lucene-java/JavaBugs

Apache Lucene and Apache Solr were also heavily tested with all prerelease 
versions of Java 8, so you can also give it a try! Looking forward to the 
official Java 8 release next week - I will run my indexes with that version for 
sure!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de





RE: solr bug feedback

2013-02-20 Thread Uwe Schindler
This is already fixed in Solr 4.1!

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: u...@thetaphi.de

 

From: 虛客 [mailto:itemdet...@qq.com] 
Sent: Wednesday, February 20, 2013 11:17 AM
To: solr-user
Subject: solr bug feedback

 

solr: 3.6.1  ---> Class: SolrRequestParsers --->line: 75  hava a  manual 
mistake:

“long uploadLimitKB = 1048;  // 2MB default” should  to  “long uploadLimitKB = 
2048;  // 2MB default”。

 

thinks for open source!!!



RE: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Uwe Schindler
Hi Geetha Anjali,

Lucene will not use MMapDirectoy by default on 32 bit platforms or if you
are not using a Oracle/Sun JVM. On 64 bit platforms, Lucene will use it, but
will accept the risks of segfaulting when unmapping the buffers - Lucene
does try its best to prevent this. It is a risk, but accepted by the Lucene
developers.

To come back to your issue: It is perfectly fine on Solr/Lucene to not unmap
all buffers as long as the index is open. The number of open file handles is
another discussion, but not related at all to MMap, if you are using an old
Lucene version (like 3.0.2), you should upgrade in all cases The recent one
is 3.6.1.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: geetha anjali [mailto:anjaliprabh...@gmail.com]
> Sent: Monday, July 23, 2012 4:28 AM
> Subject: Re: How to setup SimpleFSDirectoryFactory
> 
> Hu Uwe,
> Thanks Wwe, Have you checked the Bug in JRE for mmapDirectory?. I was
> mentioning this, This is posted in Oracle site, and the API doc.
> They accept this as a bug, have you seen this?.
> 
> "MMapDirectory<http://lucene.apache.org/java/3_0_2/api/core/org/apache/l
> u=ene/store/MMapDirectory.html>uses
> memory-mapped IO when reading. This is a good choice if you have plenty of
> virtual memory relative to your index size, eg if you are running on a 64
bit JRE,
> or you are running on a 32 bit JRE but your index sizes are small enough
to fit
> into the virtual memory space. Java has currently the limitation of not
being
> able to unmap files from user code. The files are unmapped, when GC
releases
> the byte buffers. *Due to this
> bug<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038>in
> Sun's JRE,
> MMapDirectory's
>
**IndexInput.close()*<http://lucene.apache.org/java/3_0_2/api/core/org/apac
> =e/lucene/store/IndexInput.html#close%28%29>
> * is unable to close the underlying OS file handle. Only when GC finally
collects
> the underlying objects, which could be quite some time later, will the
file
> handle be closed*. *This will consume additional transient disk
> usage*: on Windows, attempts to delete or overwrite the files will result
in an
> exception; on other platforms, which typically have a "delete on last
close"
> semantics, while such operations will succeed, the bytes are still
consuming
> space on disk. For many applications this limitation is not a problem
(e.g. if you
> have plenty of disk space, and you don't rely on overwriting files on
Windows)
> but it's still an important limitation to be aware of. This class supplies
a
> (possibly dangerous) workaround mentioned in the bug report, which may
fail
> on non-Sun JVMs. "
> 
> 
> Thanks,
> 
> 
> On Mon, Jul 23, 2012 at 4:13 AM, Uwe Schindler  wrote:
> 
> > It is hopeless to talk to both of you, you don't understand virtual
memor=:
> >
> > > I get a similar situation using Windows 2008 and Solr 3.6. Memory
> > > using mmap=is never released. Even if I turn off traffic and commit
> > > and do =
> > manual
> > > gc= If the size of the index is 3gb then memory used will be heap +
> > > 3=b
> > of
> > > sha=ed used. If I use a 6gb index I get heap + 6gb.
> >
> > That is expected, but we are talking not about allocated physical
> > memory, we are talking about allocated ADDRESS SPACE and you have 2^47
> > of that on 64bit platforms. There is no physical memory wasted or
> > allocated - please read the blog post a third, forth, fifth... or
> > tenth time, until it is obvious. Yo= should also go back to school and
> > take a course on system programming and operating system kernels.
> > Every CS student gets that taught in his first year (at least in
> > Germany).
> >
> > Java's GC has nothing to do with that - as long as the index is open,
> > ADDRESS SPACE is assigned. We are talking not about memory nor Java
> > heap space.
> >
> > > If I turn off
> > > MMapDirectory=actory it goes back down. When is the MMap supposed to
> > > release memory ? It o=ly does it on JVM restart now.
> >
> > Can you please stop spreading nonsense about MMapDirectory with no
> > knowledge behind? http://www.linuxatemyram.com/ - Also applies to
> > Windows.
> >
> > Uwe
> >
> > > Bill Bell
> > > Sent from mobile
> > >
> > >
> > > On Jul 22, 2012, at 6:21 AM, geetha anjali
> > >  wrote:=
> > > > It happens in 3.6, for this reasons I thought of moving to solandra.
> > > > If I do a commit, the all documents are persisted with out any
>

RE: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Uwe Schindler
It is hopeless to talk to both of you, you don't understand virtual memory:

> I get a similar situation using Windows 2008 and Solr 3.6. Memory using
> mmap=is never released. Even if I turn off traffic and commit and do a
manual
> gc= If the size of the index is 3gb then memory used will be heap + 3gb of
> sha=ed used. If I use a 6gb index I get heap + 6gb. 

That is expected, but we are talking not about allocated physical memory, we
are talking about allocated ADDRESS SPACE and you have 2^47 of that on 64bit
platforms. There is no physical memory wasted or allocated - please read the
blog post a third, forth, fifth... or tenth time, until it is obvious. You
should also go back to school and take a course on system programming and
operating system kernels. Every CS student gets that taught in his first
year (at least in Germany).

Java's GC has nothing to do with that - as long as the index is open,
ADDRESS SPACE is assigned. We are talking not about memory nor Java heap
space.

> If I turn off
> MMapDirectory=actory it goes back down. When is the MMap supposed to
> release memory ? It o=ly does it on JVM restart now.

Can you please stop spreading nonsense about MMapDirectory with no knowledge
behind? http://www.linuxatemyram.com/ - Also applies to Windows.

Uwe

> Bill Bell
> Sent from mobile
> 
> 
> On Jul 22, 2012, at 6:21 AM, geetha anjali 
> wrote:=
> > It happens in 3.6, for this reasons I thought of moving to solandra.
> > If I do a commit, the all documents are persisted with out any issues.
> > There is no issues  in terms of any functionality, but only this
> > happens i= increase in physical RAM, goes higher and higher and stop
> > at maximum and i= never comes down.
> >
> > Thanks
> >
> > On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog 
> wrote:
> >
> >> Interesting. Which version of Solr is this? What happens if you do a
> >> commit?
> >>
> >> On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali
> =>> wrote:
> >>> Hi uwe,
> >>> Great to know. We have files indexing 1/min. After 30 mins I see
> >>> all=>>> my physical memory say its 100 percentage used(windows). On
> >>> deep investigation found that mmap is not releasing os files handles.
Do
> you find this behaviour?
> >>>
> >>> Thanks
> >>>
> >>> On 20 Jul 2012 14:04, "Uwe Schindler"  wrote:
> >>>
> >>> Hi Bill,
> >>>
> >>> MMapDirectory uses the file system cache of your operating system,
> >>> which=>> has following consequences: In Linux, top & free should
> >>> normally report only=>>> *few* free memory, because the O/S uses all
> >>> memory not allocated by applications to cache disk I/O (and shows it
> >>> as allocated, so having 0%
> >> free
> >>> memory is just normal on Linux and also Windows). If you have other
> >>> applications or Lucene/Solr itself that allocate lot's of heap space
> >>> or
> >>> malloc() a lot, then you are reducing free physical memory, so
> >>> reducing
> >> fs
> >>> cache. This depends also on your swappiness parameter (if swappiness
> >>> is higher, inactive processes are swapped out easier, default is 60%
> >>> on
> >> linux -
> >>> freeing more space for FS cache - the backside is of course that
> >>> maybe in-memory structures of Lucene and other applications get pages
> out).
> >>>
> >>> You will only see no paging at all if all memory allocated all
> >> applications
> >>> + all mmapped files fit into memory. But paging in/out the mmapped
> >>> + Lucen=
> >>> index is much cheaper than using SimpleFSDirectory or
> >> NIOFSDirectory. If
> >>> you use SimpleFS or NIO and your index is not in FS cache, it will
> >>> also
> >> read
> >>> it from physical disk again, so where is the difference. Paging is
> >> actually
> >>> cheaper as no syscalls are involved.
> >>>
> >>> If you want as much as possible of your index in physical RAM, copy
> >>> it t= /dev/null regularily and buy more RUM :-)
> >>>
> >>>
> >>> -
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >>> eMail: uwe@thetaphi...
> >>>
> >>>> From: Bill Bell [mailto:billnb...@gmail.com]
> >>>> Sent: Friday, July 20, 2012 5:17 AM
> >>>> Subject: Re: ...
> >&

RE: RE: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Uwe Schindler
Hi,

It seems that both of you simply don't understand what's happening in your
operating system kernel. Please read the blog post again!

> It happens in 3.6, for this reasons I thought of moving to solandra.
> If I do a commit, the all documents are persisted with out any issues.
> There is no issues  in terms of any functionality, but only this happens
is
> increase in physical RAM, goes higher and higher and stop at maximum and
it
> never comes down.

This is perfectly fine in Windows and Linux (and any other operating
system). If an operating system would not use *all* available physical
memory it would waste costly hardware resources. Why not use resources that
are unused otherwise? As said before:

O/S kernel uses *all* available physical RAM for caching file system
accesses. The memory used for that is always reported as not free, because
it is used (very simple, right?). But if some other application wants to use
it, its free for malloc(), so it is not permanently occupied. That's always
that case, using MMapDirectory or not (same for SimpleFSDirectory or
NIOFSDirectory).

Of course, when you freshly booted your kernel, it reports free memory, but
definitely not on a server running 24/7 since weeks.

For all people who don't want to understand that, here is the easy
explanation page:
http://www.linuxatemyram.com/

> > > all my physical memory say its 100 percentage used(windows). On deep
> > > investigation found that mmap is not releasing os files handles. Do
> > > you find this behaviour?

One comment: The file handles are not freed as long as the index is open.
Used file handles have nothing to do with memory mapping, it's completely
unrelated to each other.

Uwe

> On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog  wrote:
> 
> > Interesting. Which version of Solr is this? What happens if you do a
> > commit?
> >
> > On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali
> > 
> > wrote:
> > > Hi uwe,
> > > Great to know. We have files indexing 1/min. After 30 mins I see
> > > all my physical memory say its 100 percentage used(windows). On deep
> > > investigation found that mmap is not releasing os files handles. Do
> > > you find this behaviour?
> > >
> > > Thanks
> > >
> > > On 20 Jul 2012 14:04, "Uwe Schindler"  wrote:
> > >
> > > Hi Bill,
> > >
> > > MMapDirectory uses the file system cache of your operating system,
> > > which
> > has
> > > following consequences: In Linux, top & free should normally report
> > > only
> > > *few* free memory, because the O/S uses all memory not allocated by
> > > applications to cache disk I/O (and shows it as allocated, so having
> > > 0%
> > free
> > > memory is just normal on Linux and also Windows). If you have other
> > > applications or Lucene/Solr itself that allocate lot's of heap space
> > > or
> > > malloc() a lot, then you are reducing free physical memory, so
> > > reducing
> > fs
> > > cache. This depends also on your swappiness parameter (if swappiness
> > > is higher, inactive processes are swapped out easier, default is 60%
> > > on
> > linux -
> > > freeing more space for FS cache - the backside is of course that
> > > maybe in-memory structures of Lucene and other applications get pages
> out).
> > >
> > > You will only see no paging at all if all memory allocated all
> > applications
> > > + all mmapped files fit into memory. But paging in/out the mmapped
> > > + Lucene
> > > index is muuuuuch cheaper than using SimpleFSDirectory or
> > NIOFSDirectory. If
> > > you use SimpleFS or NIO and your index is not in FS cache, it will
> > > also
> > read
> > > it from physical disk again, so where is the difference. Paging is
> > actually
> > > cheaper as no syscalls are involved.
> > >
> > > If you want as much as possible of your index in physical RAM, copy
> > > it to /dev/null regularily and buy more RUM :-)
> > >
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > eMail: uwe@thetaphi...
> > >
> > >> From: Bill Bell [mailto:billnb...@gmail.com]
> > >> Sent: Friday, July 20, 2012 5:17 AM
> > >> Subject: Re: ...
> > >> s=op using it? The least used memory will be removed from the OS
> > >> automaticall=? Isee some paging. Wouldn't paging slow down the
> querying?
> > >
> > >>
> > >> My index

[ANNOUNCE] Apache Solr 3.6.1 released

2012-07-22 Thread Uwe Schindler
22 July 2012, Apache SolrT 3.6.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.6.1.

Solr is the popular, blazing fast open source enterprise search platform
from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document (e.g., Word, PDF) handling, and geospatial
search.
Solr is highly scalable, providing distributed search and index replication,
and it powers the search and navigation features of many of the world's
largest internet sites.

This release is a bug fix release for version 3.6.0. It contains numerous
bug fixes, optimizations, and improvements, some of which are highlighted
below.  The release is available for immediate download at:
   http://lucene.apache.org/solr/mirrors-solr-3x-redir.html (see
note below).

See the CHANGES.txt file included with the release for a full list of
details.

Solr 3.6.1 Release Highlights:

 * The concurrency of MMapDirectory was improved, which caused
   a performance regression in comparison to Solr 3.5.0. This affected
   users with 64bit platforms (Linux, Solaris, Windows) or those
   explicitely using MMapDirectoryFactory.

 * ReplicationHandler "maxNumberOfBackups" was fixed to work if backups are
   triggered on commit.

 * Charset problems were fixed with HttpSolrServer, caused by an upgrade to
   a new Commons HttpClient version in 3.6.0.

 * Grouping was fixed to return correct count when not all shards are
   queried in the second pass. Solr no longer throws Exception when using
   result grouping with main=true and using wt=javabin.

 * Config file replication was made less error prone.

 * Data Import Handler threading fixes.

 * Various minor bugs were fixed.

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Uwe Schindler (release manager)
& all Lucene/Solr developers

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/





RE: How to setup SimpleFSDirectoryFactory

2012-07-20 Thread Uwe Schindler
Hi Bill,

MMapDirectory uses the file system cache of your operating system, which has
following consequences: In Linux, top & free should normally report only
*few* free memory, because the O/S uses all memory not allocated by
applications to cache disk I/O (and shows it as allocated, so having 0% free
memory is just normal on Linux and also Windows). If you have other
applications or Lucene/Solr itself that allocate lot's of heap space or
malloc() a lot, then you are reducing free physical memory, so reducing fs
cache. This depends also on your swappiness parameter (if swappiness is
higher, inactive processes are swapped out easier, default is 60% on linux -
freeing more space for FS cache - the backside is of course that maybe
in-memory structures of Lucene and other applications get pages out).

You will only see no paging at all if all memory allocated all applications
+ all mmapped files fit into memory. But paging in/out the mmapped Lucene
index is much cheaper than using SimpleFSDirectory or NIOFSDirectory. If
you use SimpleFS or NIO and your index is not in FS cache, it will also read
it from physical disk again, so where is the difference. Paging is actually
cheaper as no syscalls are involved.

If you want as much as possible of your index in physical RAM, copy it to
/dev/null regularily and buy more RUM :-)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Bill Bell [mailto:billnb...@gmail.com]
> Sent: Friday, July 20, 2012 5:17 AM
> Subject: Re: How to setup SimpleFSDirectoryFactory
> 
> Thanks. Are you saying that if we run low on memory, the MMapDirectory
will
> s=op using it? The least used memory will be removed from the OS
> automaticall=? Isee some paging. Wouldn't paging slow down the querying?
> 
> My index is 10gb and every 8 hours we get most of it in shared memory. The
> m=mory is 99 percent used, and that does not leave any room for other
apps. =
> Other implications?
> 
> Sent from my mobile device
> 720-256-8076
> 
> On Jul 19, 2012, at 9:49 AM, "Uwe Schindler"  wrote:
> 
> > Read this, then you will see that MMapDirectory will use 0% of your Java
> H=ap space or free system RAM:
> >
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.htm
> > l
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >> -Original Message-
> >> From: William Bell [mailto:billnb...@gmail.com]
> >> Sent: Tuesday, July 17, 2012 6:05 AM
> >> Subject: How to setup SimpleFSDirectoryFactory
> >>
> >> We all know that MMapDirectory is fastest. However we cannot always
> >> use i= since you might run out of memory on large indexes right?
> >>
> >> Here is how I got iSimpleFSDirectoryFactory to work. Just set -
> >> Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory.
> >>
> >> Your solrconfig.xml:
> >>
> >>  >> class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
> >>
> >> You can check it with http://localhost:8983/solr/admin/stats.jsp
> >>
> >> Notice that the default for Windows 64bit is MMapDirectory. Else
> >> NIOFSDirectory except for WIndows It would be nicer if we just
> >> set it=all up with a helper in solrconfig.xml...
> >>
> >> if (Constants.WINDOWS) {
> >> if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64BIT)
> >>return new MMapDirectory(path, lockFactory);
> >> else
> >>return new SimpleFSDirectory(path, lockFactory);
> >> } else {
> >>return new NIOFSDirectory(path, lockFactory);
> >>  }
> >> }
> >>
> >>
> >>
> >> --
> >> Bill Bell
> >> billnb...@gmail.com
> >> cell 720-256-8076
> >
> >




RE: How to setup SimpleFSDirectoryFactory

2012-07-19 Thread Uwe Schindler
Read this, then you will see that MMapDirectory will use 0% of your Java Heap 
space or free system RAM:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: William Bell [mailto:billnb...@gmail.com]
> Sent: Tuesday, July 17, 2012 6:05 AM
> Subject: How to setup SimpleFSDirectoryFactory
> 
> We all know that MMapDirectory is fastest. However we cannot always use it
> since you might run out of memory on large indexes right?
> 
> Here is how I got iSimpleFSDirectoryFactory to work. Just set -
> Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory.
> 
> Your solrconfig.xml:
> 
>  class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
> 
> You can check it with http://localhost:8983/solr/admin/stats.jsp
> 
> Notice that the default for Windows 64bit is MMapDirectory. Else
> NIOFSDirectory except for WIndows It would be nicer if we just set it all 
> up
> with a helper in solrconfig.xml...
> 
> if (Constants.WINDOWS) {
>  if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64BIT)
> return new MMapDirectory(path, lockFactory);
>  else
> return new SimpleFSDirectory(path, lockFactory);
>  } else {
> return new NIOFSDirectory(path, lockFactory);
>   }
> }
> 
> 
> 
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076




Java 7u1 fixes index corruption and crash bugs in Apache Lucene Core and Apache Solr

2011-10-26 Thread Uwe Schindler
Hi users of Apache Lucene Core and Apache Solr,

Oracle released Java 7u1 [1] on October 19. According to the release notes
and tests done by the Lucene committers, all bugs reported on July 28 are
fixed in this release, so code using Porter stemmer no longer crashes with
SIGSEGV. We were not able to experience any index corruption anymore, so it
is safe to use Java 7u1 with Lucene Core and Solr.

On the same day, Oracle released Java 6u29 [2] fixing the same problems
occurring with Java 6, if the JVM switches -XX:+AggressiveOpts or
-XX:+OptimizeStringConcat were used. Of course, you should not use
experimental JVM options like -XX:+AggressiveOpts in production
environments! We recommend everybody to upgrade to this latest version 6u29.

In case you upgrade to Java 7, remember that you may have to reindex, as the
unicode version shipped with Java 7 changed and tokenization behaves
differently (e.g. lowercasing). For more information, read
JRE_VERSION_MIGRATION.txt in your distribution package!

On behalf of the Apache Lucene/Solr committers,
Uwe Schindler

[1] http://www.oracle.com/technetwork/java/javase/7u1-relnotes-507962.html
[2] http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




[WARNING] Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7

2011-07-28 Thread Uwe Schindler
Hello Apache Lucene & Apache Solr users,
Hello users of other Java-based Apache projects,

Oracle released Java 7 today. Unfortunately it contains hotspot compiler
optimizations, which miscompile some loops. This can affect code of several
Apache projects. Sometimes JVMs only crash, but in several cases, results
calculated can be incorrect, leading to bugs in applications (see Hotspot
bugs 7070134 [1], 7044738 [2], 7068051 [3]).

Apache Lucene Core and Apache Solr are two Apache projects, which are
affected by these bugs, namely all versions released until today. Solr users
with the default configuration will have Java crashing with SIGSEGV as soon
as they start to index documents, as one affected part is the well-known
Porter stemmer (see LUCENE-3335 [4]). Other loops in Lucene may be
miscompiled, too, leading to index corruption (especially on Lucene trunk
with pulsing codec; other loops may be affected, too - LUCENE-3346 [5]).

These problems were detected only 5 days before the official Java 7 release,
so Oracle had no time to fix those bugs, affecting also many more
applications. In response to our questions, they proposed to include the
fixes into service release u2 (eventually into service release u1, see [6]).
This means you cannot use Apache Lucene/Solr with Java 7 releases before
Update 2! If you do, please don't open bug reports, it is not the
committers' fault! At least disable loop optimizations using the
-XX:-UseLoopPredicate JVM option to not risk index corruptions.

Please note: Also Java 6 users are affected, if they use one of those JVM
options, which are not enabled by default: -XX:+OptimizeStringConcat or
-XX:+AggressiveOpts

It is strongly recommended not to use any hotspot optimization switches in
any Java version without extensive testing!

In case you upgrade to Java 7, remember that you may have to reindex, as the
unicode version shipped with Java 7 changed and tokenization behaves
differently (e.g. lowercasing). For more information, read
JRE_VERSION_MIGRATION.txt in your distribution package!

On behalf of the Lucene project,
Uwe

[1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134
[2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738
[3] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051
[4] https://issues.apache.org/jira/browse/LUCENE-3335
[5] https://issues.apache.org/jira/browse/LUCENE-3346
[6] http://s.apache.org/StQ

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: Solr 3.1 / Java 1.5: Exception regarding analyzer implementation

2011-05-10 Thread Uwe Schindler
Hi,

> On 09.05.11 11:04, Martin Jansen wrote:
> > I just attempted to set up an instance of Solr 3.1 in Tomcat 5.5
> > running in Java 1.5.  It fails with the following exception on start-up:
> >
> >> java.lang.AssertionError: Analyzer implementation classes or at least
> >> their tokenStream() and reusableTokenStream() implementations must
> be
> >> final at
> >> org.apache.lucene.analysis.Analyzer.assertFinal(Analyzer.java:57)
> 
> In the meantime I solved the issue by installing Java 1.6.  Works without
a
> problem now, but I'm wondering if Solr 3.1 is intentionally incompatible
to
> Java 1.5 or if if happened by mistake.

Solr 3.1 is compatible with Java 1.5 and runs fine with that. The exception
you are seeing should not happen for Analyzers that are shipped with
Solr/Lucene, they can only happen if you wrote your own
Analyzer/TokenStreams that are not declared final as requested. In that case
the error will also happen with Java 6.

BUT: This is only an assertion to make development and debugging easier. The
assertions should not run in production mode, as they may affect performance
(seriously)! You should check you java command line for -ea parameters and
remove them on production.

The reason why this assert hits you in one of your tomcat installations
could also be related to some instrumentation tools you have enabled in this
tomcat. Lot's of instrumentation tools may dynamically change class bytecode
and e.g. make them unfinal. In that case the assertion of course fails (with
assertions enabled). Before saying Solr 3.1 is not compatible with Java 1.5:

- Disable assertions in production (by removing -ea command line parameters,
see http://download.oracle.com/javase/1.4.2/docs/guide/lang/assert.html)
- Check your configuration if you have some instrumentation enabled.

Both of the above points may not affect you on the other server that runs
fine with Java 6.

Uwe