Re: [ANNOUNCE] Apache Solr 8.5.0 released

2020-03-24 Thread Hasan Diwan
Congrats! -- H

On Tue, 24 Mar 2020 at 05:32, Alan Woodward  wrote:

> ## 24 March 2020, Apache Solr™ 8.5.0 available
>
> The Lucene PMC is pleased to announce the release of Apache Solr 8.5.0.
>
> Solr is the popular, blazing fast, open source NoSQL search platform from
> the Apache Lucene project. Its major features include powerful full-text
> search, hit highlighting, faceted search, dynamic clustering, database
> integration, rich document handling, and geospatial search. Solr is highly
> scalable, providing fault tolerant distributed search and indexing, and
> powers the search and navigation features of many of the world's largest
> internet sites.
>
> Solr 8.5.0 is available for immediate download at:
>
>   
>
> ### Solr 8.5.0 Release Highlights:
>
>  * A new queries property of JSON Request API let to declare queries in
> Query DSL format and refer to them by their names.
>  * A new command line tool bin/postlogs allows you to index Solr logs into
> a Solr collection. This is helpful for log analysis and troubleshooting.
> Documentation is not yet integrated into the Solr Reference Guide, but is
> available in a branch via GitHub:
> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/logs.adoc
> .
>  * A new stream decorator delete() is available to help solve some issues
> with traditional delete-by-query, which can be expensive in large indexes.
>  * Solr now has the ability to run with a Java Security Manager enabled.
>
> Please read CHANGES.txt for a full list of changes:
>
>   
>
> Solr 8.5.0 also includes improvements and bugfixes in the corresponding
> Apache Lucene release:
>
>   
>
>

-- 
OpenPGP:
https://sks-keyservers.net/pks/lookup?op=get=0xFEBAD7FFD041BBA1
If you wish to request my time, please do so using
*bit.ly/hd1AppointmentRequest
*.
Si vous voudrais faire connnaisance, allez a *bit.ly/hd1AppointmentRequest
*.

Sent
from my mobile device
Envoye de mon portable


Re: [ANNOUNCE] Apache Solr 8.1.0 released

2019-05-16 Thread Hasan Diwan
On Thu, 16 May 2019 at 00:43, Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> = 16 March 2019, Apache Solr™ 8.1.0 available =
>
> The Lucene PMC is pleased to announce the release of Apache Solr 8.1.0
>

Congrats to all involved! -- H
-- 
OpenPGP:
https://sks-keyservers.net/pks/lookup?op=get=0xFEBAD7FFD041BBA1
If you wish to request my time, please do so using
*bit.ly/hd1AppointmentRequest
*.
Si vous voudrais faire connnaisance, allez a *bit.ly/hd1AppointmentRequest
*.

Sent
from my mobile device
Envoye de mon portable


Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content

2018-12-31 Thread Hasan Diwan
Perhaps https://royvanrijn.com/blog/2016/03/java-mail-message-as-download/
may be helpful? Though I see the date on it and am now unsure. -- H

On Mon, 31 Dec 2018 at 17:51, Zheng Lin Edwin Yeo 
wrote:

> Hi Alex,
>
> I have tried with a file that is HTML formatted, with those tags like
> , , , etc, and those gets removed during indexing.
>
> For tags like "*FONT-SIZE: 9pt; FONT-FAMILY: arial*", I found that in the
> EML file, there are two different content type, text/html and text/plain.
> Could it be due to Tika getting the content type from text/html instead of
> text/plain?
>
> Regards,
> Edwin
>
> On Mon, 31 Dec 2018 at 23:52, Alexandre Rafalovitch 
> wrote:
>
> > EML is for emails, so there are probably some HTML-formatted emails
> > that you are getting. Probably with the alternative text-part. Outlook
> > would render HTML and/or use text part. I think you can just open EML
> > in an editor to check it out.
> >
> > As to URP, are you absolutely sure it is being used? It is not
> > declared as default, so you need to call it explicitly. Try setting a
> > field in there or some other clear flag that a record has been
> > processed.
> >
> > Regards,
> > Alex.
> >
> > On Sun, 30 Dec 2018 at 22:46, Zheng Lin Edwin Yeo 
> > wrote:
> > >
> > > These texts are likely from the original EML file data, but they are
> not
> > > visible in the content when the EML file is opened in Microsoft
> Outlook.
> > >
> > > I have already applied the HTMLStripFieldUpdateProcessorFactory in
> > > solrconfig.xml, but these texts are still showing up in the index.
> Below
> > is
> > > my configuration.
> > >
> > > 
> > >
> > >  > > class="solr.HTMLStripFieldUpdateProcessorFactory">
> > >
> > >> > name="fieldName">content_tcs
> > >
> > > 
> > >
> > >  > > class="solr.LogUpdateProcessorFactory" />
> > >
> > >  > > class="solr.RunUpdateProcessorFactory" />
> > >
> > > 
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> > > On Mon, 31 Dec 2018 at 11:29, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > > wrote:
> > >
> > > > Specifically, a custome Update Request Processor chain can be used
> > before
> > > > indexing. Probably with HTMLStripFieldUpdateProcessorFactory
> > > > Regards,
> > > >  Alex
> > > >
> > > > On Sun, Dec 30, 2018, 9:26 PM Vincenzo D'Amore  > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I think this kind of text manipulation should be done before
> > indexing, if
> > > > > you have font-size font-family in your text, very likely you’re
> > indexing
> > > > an
> > > > > html with css.
> > > > > If I’m right, you’re just entering in a hell of words that should
> be
> > > > > removed from your text.
> > > > >
> > > > > On the other hand, if you have to do this at index time, a quick
> and
> > > > dirty
> > > > > solution is using the pattern-replace filter.
> > > > >
> > > > >
> > > > >
> > > >
> >
> https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter
> > > > >
> > > > > Ciao,
> > > > > Vincenzo
> > > > >
> > > > > --
> > > > > mobile: 3498513251
> > > > > skype: free.dev
> > > > >
> > > > > > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I noticed that during the indexing of EMLfiles, there are words
> > like
> > > > > > "*FONT-SIZE:
> > > > > > 9pt; FONT-FAMILY: arial*" that are being indexed into the content
> > as
> > > > > well.
> > > > > >
> > > > > > Would like to check, how are we able to remove those words during
> > the
> > > > > > indexing?
> > > > > >
> > > > > > I am using Solr 7.5.0
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > >
> > > >
> >
>


-- 
OpenPGP:
https://sks-keyservers.net/pks/lookup?op=get=0xFEBAD7FFD041BBA1
If you wish to request my time, please do so using
*bit.ly/hd1AppointmentRequest
*.
Si vous voudrais faire connnaisance, allez a *bit.ly/hd1AppointmentRequest
*.

Sent
from my mobile device
Envoye de mon portable


Re: [ANNOUNCE] Apache Lucene 6.6.5 released

2018-07-03 Thread Hasan Diwan
Congrats to all! -- H
On Tue, 3 Jul 2018 at 14:29, Ishan Chattopadhyaya
 wrote:
>
> 3 July 2018, Apache Lucene™ 6.6.5 available
>
> The Lucene PMC is pleased to announce the release of Apache Lucene 6.6.5.
>
> Apache Lucene is a high-performance, full-featured text search engine
> library written entirely in Java. It is a technology suitable for nearly
> any application that requires full-text search, especially cross-platform.
>
> This release contains one bug fix. The release is available for immediate
> download at:
> http://lucene.apache.org/core/mirrors-core-latest-redir.html
>
> Further details of changes are available in the change log available at:
> http://lucene.apache.org/core/6_6_5/changes/Changes.html
>
> Please report any feedback to the mailing lists (
> http://lucene.apache.org/core/discussion.html)
>
> Note: The Apache Software Foundation uses an extensive mirroring network
> for distributing releases. It is possible that the mirror you are using may
> not have replicated the release yet. If that is the case, please try
> another mirror. This also applies to Maven access.



-- 
OpenPGP: https://sks-keyservers.net/pks/lookup?op=get=0xFEBAD7FFD041BBA1
If you wish to request my time, please do so using bit.ly/hd1AppointmentRequest.
Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest.

Sent from my mobile device
Envoye de mon portable


Re: INVALID in email address

2016-10-13 Thread Hasan Diwan
The "From" field in your email client is set to that. However, your
"Reply-To" is correct. As to the deeper reason "why", it's a fairly
easy-to-defeat workaround for spam.. -- H

On 13 October 2016 at 01:39,  wrote:

> Anyone know why this appears after my email address when I reply to a
> thread in the user group?




-- 
OpenPGP:
https://sks-keyservers.net/pks/lookup?op=get=0xFEBAD7FFD041BBA1
Sent from my mobile device


Re: Frage zu einem komischen Verhalten

2015-09-02 Thread Hasan Diwan
You might get a better response in English...

Vielleicht haben Sie eine bessere Antwort bekommen in... (from Google
Translate, as my own German is non-existent) -- H

2015-09-02 2:05 GMT-07:00 Long Yan :

> Guten Tag,
> ich habe einen Core mit dem folgendem Befehl erstellt
> bin\solr create -c mycore
>
> Wenn ich die Datei film.csv unter solr-5.2.1\example\films\ indexiere,
> kann solr nur bis die Zeile
> "2046,Wong Kar-wai,Romance Film|Fantasy|Science
> Fiction|Drama,,/en/2046_2004,2004-05-20" indexieren.
>
> Aber wenn ich zuerst die Datei books.csv unter
> solr-5.2.1\example\exampledocs und danach film.csv indexiere,
> kann solr alle Zeilen in film.csv indexieren.
>
> Kann Jemand mir bitte Hinweis geben, woran es liegen könnte?
>
> Grüße
> Long Yan
>



-- 
OpenPGP: https://hasan.d8u.us/gpg.key
Sent from my mobile device
Envoyé de mon portable


Re: DIH import from MySQL results in garbage text for special chars

2012-09-27 Thread Hasan Diwan
Mr Prakash,

On 27 September 2012 02:06, Pranav Prakash pra...@gmail.com wrote:

 | Variable_name| Value  |
 +--++
 | character_set_client | latin1 |
 | character_set_connection | latin1 |
 | character_set_database   | latin1 |
 | character_set_filesystem | binary |
 | character_set_results| latin1 |
 | character_set_server | latin1 |
 | character_set_system | utf8   |


These should all be the same (presumably the system encoding).  -- H
-- 
Sent from my mobile device
Envoyait de mon portable


Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

2012-09-06 Thread Hasan Diwan
http://www.electrictoolbox.com/article/mysql/format-date-time-mysql/ hth --
H
On 6 Sep 2012 17:23, kiran chitturi chitturikira...@gmail.com wrote:

 Hi,

 I am using Solr with DIH and started getting errors when the database
 time/date fields are getting imported in to Solr. I have used the date as
 the field type but when i looked up at the docs it looks like the date
 field does not accept (Thu, 06 Sep 2012 22:32:33 +) or (1346976590)
 formats.

 Also, When i used field_type as 'text_ar' and indexed a line with arabic
 text, Solr is displaying some non-ISO characters. It looks like the text is
 not being unicoded.

 Did anyone face a similar issue ? The Solr date field type does not support
 a variety of formats.

 Is there any work around to solve this kind of issues ?

 Many Thanks,
 --
 Kiran Chitturi



Re: Debugging DIH

2012-08-26 Thread Hasan Diwan
Mr Norskong, et al,

On 26 August 2012 14:37, Lance Norskog goks...@gmail.com wrote:

 Also, there is a logging feature to print intermediate values.


I see the data as it should be. It's just not recorded into SOLR. One
possible concern is that I have timestamp in epoch seconds, which I'd like
to store as a date on the SOLR side; I know I can apply a transformer to do
this, but what's the format for it? Many thanks! -- H
-- 
Sent from my mobile device
Envoyait de mon portable


Debugging DIH

2012-08-24 Thread Hasan Diwan
I have some data in an H2 database that I'd like to move to SOLR. I
probably should/could extract and post the contents as 1 new document per
record, but I'd like to configure the data import handler and am having
some difficulty doing so. Following the wiki instructions[1], I have the
following in my db-data-config.xml:
dataConfig
dataSource type=JdbcDataSource driver=org.h2.Driver
url=jdbc:h2:tcp://192.168.1.6/finance user=sa /
document
  entity name=receipt query=select location as location, amount as
amount, done_on as when from RECEIPTS as r join APP_USERS as a on r.user_id
= a.id/
/document
/dataConfig

I also have dropped the JDBC driver into db/lib, witness:
% jar tvf ./lib/h2-1.3.164.jar | grep 'Driver'
13 Fri Feb 03 12:02:56 PST 2012 META-INF/services/java.sql.Driver
  2508 Fri Feb 03 12:02:56 PST 2012 org/h2/Driver.class
   485 Fri Feb 03 12:02:56 PST 2012 org/h2/util/DbDriverActivator.class

and I've added the appropriate fields to schema.xml:
  field name=location type=string indexed=true stored=true/
   field name=amount type=currency indexed=true stored=true/
   field name=when type=date indexed=true stored=true/

There's nothing in my index and 343 rows in my table. What is going on? -- H
-- 
Sent from my mobile device
Envoyait de mon portable
1. http://wiki.apache.org/solr/DIHQuickStart


Re: Debugging DIH

2012-08-24 Thread Hasan Diwan
On 24 August 2012 07:17, Hasan Diwan hasan.di...@gmail.com wrote:

 I have some data in an H2 database that I'd like to move to SOLR. I
 probably should/could extract and post the contents as 1 new document per
 record, but I'd like to configure the data import handler and am having
 some difficulty doing so. Following the wiki instructions[1], I have the
 following in my db-data-config.xml:
 dataConfig
 dataSource type=JdbcDataSource driver=org.h2.Driver
 url=jdbc:h2:tcp://192.168.1.6/finance user=sa /
 document
   entity name=receipt query=select location as location, amount as
 amount, done_on as when from RECEIPTS as r join APP_USERS as a on r.user_id
 = a.id/
 /document
 /dataConfig

 I also have dropped the JDBC driver into db/lib, witness:
 % jar tvf ./lib/h2-1.3.164.jar | grep 'Driver'
 13 Fri Feb 03 12:02:56 PST 2012 META-INF/services/java.sql.Driver
   2508 Fri Feb 03 12:02:56 PST 2012 org/h2/Driver.class
485 Fri Feb 03 12:02:56 PST 2012 org/h2/util/DbDriverActivator.class

 and I've added the appropriate fields to schema.xml:
   field name=location type=string indexed=true stored=true/
field name=amount type=currency indexed=true stored=true/
field name=when type=date indexed=true stored=true/

 There's nothing in my index and 343 rows in my table. What is going on? --
 H


One more data point:
% curl -L http://localhost:8983/solr/db/dataimport?command=status;

 ?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime0/int/lstlst name=initArgslst name=defaultsstr
name=configdb-data-config.xml/str/lst/lststr
name=commandstatus/strstr name=statusidle/strstr
name=importResponse/lst name=statusMessagesstr name=Total Requests
made to DataSource1/strstr name=Total Rows Fetched343/strstr
name=Total Documents Skipped0/strstr name=Full Dump
Started2012-08-24 07:19:26/strstr name=Indexing completed.
Added/Updated: 0 documents. Deleted 0 documents./strstr
name=Committed2012-08-24 07:19:26/strstr name=Optimized2012-08-24
07:19:26/strstr name=Total Documents Processed0/strstr name=Time
taken 0:0:0.328/str/lststr name=WARNINGThis response format is
experimental.  It is likely to change in the future./str
/response



-- 
Sent from my mobile device
Envoyait de mon portable


Re: Trending topics?

2012-08-02 Thread Hasan Diwan
Tor,
I hope that the information in
http://www.jason-palmer.com/2011/05/creating-a-tag-cloud-with-solr-and-php/
helps.. -- H
On 2 August 2012 15:48, Lance Norskog goks...@gmail.com wrote:

 Two easy ones:
 1) Facets on a text field are simple word counts by document.
 2) If you want the number of times a word appears inside a document,
 that requires a separate dataset called a 'term vector'. This is a
 list of all words in a document with a count for each one.
 These are simple queries. There are also batch computations where you
 create a 'term-document matrix', with a row for each document and a
 column for all terms that appear in any document. These computations
 require exporting all of your data into a separate computation.



 On Thu, Aug 2, 2012 at 1:26 PM, Chris Dawson xrdaw...@gmail.com wrote:
  Tor,
 
  Thanks for your response.
 
  I'd like to put an arbitrary set of text into Solr and then have Solr
 tell
  me the ten most popular topics that are in there.  For example, if I
 put
  in 100 paragraphs of text about sports, I would like to retrieve topics
  like swimming, basketball, tennis if the three most popular and
 discussed
  topics are those inside the text.
 
  Is Solr the correct tool to do something like this?  Or, is this too
  unstructured to get this kind of result without manually categorizing it?
 
  Is the correct term for this faceting?  It seems to me that faceting
  requires putting the data into a more structured format (for example,
  telling the index that this is the manufacturer, etc.)
 
  Basically, I would like to get something like a tag cloud (relevant
 topics
  with weights for each term) without asking users to tag things manually.
 
  Chris
 
  On Thu, Aug 2, 2012 at 3:25 PM, Tor Henning Ueland 
 tor.henn...@gmail.comwrote:
 
  On Thu, Aug 2, 2012 at 5:34 PM, Chris Dawson xrdaw...@gmail.com
 wrote:
 
   How would I generate a list of trending topics using solr?
  
 
 
  By putting them in solr.
  (Generic question get at generic answer)
 
  What do you mean? Trending searches, trending data, trending documents,
  trending what?
 
 
  --
  Regards
  Tor Henning Ueland
 
 
 
 
  --
  Chris Dawson
  971-533-8335
  Human potential, travel and entrepreneurship:  http://webiphany.com/
  Traveling to Portland, OR?  http://www.airbnb.com/rooms/58909



 --
 Lance Norskog
 goks...@gmail.com




-- 
Sent from my mobile device
Envoyait de mon portable


Re: geospacial / coordinates java example anyone?

2012-07-24 Thread Hasan Diwan
On 24 July 2012 09:55, yair even-zohar yair...@yahoo.com wrote:

 Can someone please send a simple java example for indexing and querying a
 latitude, longitude coordinate on SolrDocument.
 That is, assume we have a document and we want to simply add the lat,lon
 as field to the document and then query according to distance too

 There are examples on the wiki. Please see
http://wiki.apache.org/solr/SpatialSearch/ -- H

-- 
Sent from my mobile device
Envoyait de mon portable


Re: Why is Solr still shipped with Jetty 6 / switching to Jetty 8?

2012-06-23 Thread Hasan Diwan
On 23 June 2012 17:20, Lance Norskog goks...@gmail.com wrote:

 Solr does not ship with anything as a product. It includes an old
 Jetty in the example. The feature set in Jetty has not held back
 showing an example of Solr features, so it has not been a priority. If
 you supply a patch, that is different :) Especially if the patch shows
 how to get all of the Jetty dependencies via Ivy.


Ivy? Here you are:
ivy-module version=2.0
!-- this goes in ivy.xml --
info organisation=org.apache module=hello-ivy/
dependencies
!--- found from http://mvnrepository.com/ --
dependency org=org.eclipse.jetty name=jetty-servlet
rev=8.1.4.v20120524/
/dependencies
/ivy-module

-- 
Sent from my mobile device
Envoyait de mon portable


JDBC import yields no data

2012-04-24 Thread Hasan Diwan
I'm trying to migrate from RDBMS to the Lucene ecosystem. To do this, I'm
trying to use the JDBC importer[1]. My configuration is given below:
dataConfig
  dataSource driver=net.sf.log4jdbc.DriverSpy user=sa
url=jdbc:log4jdbc:h2:tcp://192.168.1.6/finance/
  !-- dataSource driver=org.h2.Driver url=jdbc:h2:tcp://
192.168.1.6/finance user=sa / --
document
entity name=receipt query=SELECT 'transaction' as type,
currency, name, amount, done_on from receipts join app_users on user_id =
app_users.id
  deltaQuery=SELECT 'transaction' as type, name, currency, amount,
done_on from receipts join app_users on user_id = app_users.id where
done_on  '${dataimporter.last_index_time}'
field column=NAME name=name /
field column=NAME name=nameSort /
field column=NAME name=alphaNameSort /
field column=AMOUNT name=amount / !-- currencyField not
available till 3.6 --
field column=transaction_time name=done_on / !-- resolve
epoch time --
field column=location name=location/ !-- geospatial?? --
/entity
/document
/dataConfig
And the resulting query of *:*:
% curl http://192.168.1.6:8995/solr/db/select/?q=*%3A*;

   [~]
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime1/intlst name=paramsstr
name=q*:*/str/lst/lstresult name=response numFound=0
start=0/
/response
The SQL query does work properly, the relevant jars are in the lib
subdirectory. Help? -- H
-- 
Sent from my mobile device
Envoyait de mon portable
1. http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource


Re: JDBC import yields no data

2012-04-24 Thread Hasan Diwan
On 24 April 2012 07:49, Dyer, James james.d...@ingrambook.com wrote:

 You might also want to show us your dataimport handler configuration
 from solrconfig.xml and also the url you're using to start the data import.
  When its complete, browsing to 
 http://192.168.1.6:8995/solr/db/dataimport; (or whatever the DIH handler
 name is in your config) should say indexing complete and also the number
 of documents it imported.  Also, if you have commit=false in your config,
 it won't issue a commit so you won't see the documents.


solrconfig.xml:
?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

config

  luceneMatchVersionLUCENE_35/luceneMatchVersion

  jmx /

  !-- Set this to 'false' if you want solr to continue working after it
has
   encountered an severe configuration error.  In a production
environment,
   you may want solr to keep working even if one handler is
mis-configured.

   You may also set this to false using by setting the system property:
 -Dsolr.abortOnConfigurationError=false
 --

abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError

  lib dir=../../../../dist/
regex=apache-solr-dataimporthandler-.*\.jar /

  indexDefaults
   !-- Values here affect all index writers and act as a default unless
overridden. --
useCompoundFilefalse/useCompoundFile

mergeFactor10/mergeFactor
!--
 If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will
flush based on whichever limit is hit first.

 --
!--maxBufferedDocs1000/maxBufferedDocs--
!-- Tell Lucene when to flush documents to disk.
Giving Lucene more memory for indexing means faster indexing at the
cost of more RAM

If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will
flush based on whichever limit is hit first.

--
ramBufferSizeMB32/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength
writeLockTimeout1000/writeLockTimeout

!--
 Expert:
 The Merge Policy in Lucene controls how merging is handled by Lucene.
 The default in 2.3 is the LogByteSizeMergePolicy, previous
 versions used LogDocMergePolicy.

 LogByteSizeMergePolicy chooses segments to merge based on their size.
 The Lucene 2.2 default, LogDocMergePolicy chose when
 to merge based on number of documents

 Other implementations of MergePolicy must have a no-argument
constructor
 --

!--mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy--

!--
 Expert:
 The Merge Scheduler in Lucene controls how merges are performed.  The
ConcurrentMergeScheduler (Lucene 2.3 default)
  can perform merges in the background using separate threads.  The
SerialMergeScheduler (Lucene 2.2 default) does not.
 --

!--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler--

!--
  As long as Solr is the only process modifying your index, it is
  safe to use Lucene's in process locking mechanism.  But you may
  specify one of the other Lucene LockFactory implementations in
  the event that you have a custom situation.

  none = NoLockFactory (typically only used with read only indexes)
  single = SingleInstanceLockFactory (suggested)
  native = NativeFSLockFactory
  simple = SimpleFSLockFactory

  ('simple' is the default for backwards compatibility with Solr 1.2)
--
lockTypesingle/lockType
  /indexDefaults

  mainIndex
!-- options specific to the main on-disk lucene index --
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB32/ramBufferSizeMB
mergeFactor10/mergeFactor
!-- Deprecated --
!--maxBufferedDocs1000/maxBufferedDocs--
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength

!-- If true, unlock any held write or commit locks on startup.
 This defeats the locking mechanism that allows multiple
 processes to safely access a lucene index, and should be
 used with care.
 This is not needed if lock type is 'none' or 'single'
 --
unlockOnStartupfalse/unlockOnStartup
  /mainIndex

  !-- the default high-performance update handler --
  updateHandler