Fwd: solr AND riche Data

2016-09-19 Thread kostali hassan
I index rich data in solr 5.4.1 and I use solarium to search terms in index
at the field text ;how to display for each term hes
category,synonym,similair result,suggester,autocomplet...
for  exemple the term to search is q=java
similair term is: javascript,javaEE..
framworks:Hibernate,Jboss,strut,spring...
category:Informatique
NO synonym
I deevloppe this interface in php using the framework cakephp
for each document in the index I have to fields: id the path of each
files(msword and pdf) AND the field  text.
What is the best approch to build an interface displaying all information
for each term .


solr AND riche Data

2016-09-05 Thread kostali hassan
I index rich data in solr 5.4.1 and I use solarium to search terms in index
at the field text ;how to display for each term hes
category,synonym,similair result,suggester,autocomplet...
for  exemple the term to search is q=java
similair term is: javascript,javaEE..
framworks:Hibernate,Jboss,strut,spring...
category:Informatique
NO synonym
I deevloppe this interface in php using the framework cakephp
for each document in the index I have to fields: id the path of each
files(msword and pdf) AND the field  text.
What is the best approch to build an interface displaying all information
for each term .


Re: index sql databases

2016-07-19 Thread kostali hassan
I am looking for display for each user:
l'utilisateur est crée le $date à $time
not
$document-> name est crée le $document->created

2016-07-18 16:48 GMT+01:00 Erick Erickson <erickerick...@gmail.com>:

> I don't see how that relates to the original
> question.
>
> bq: when I display the field type date I get the
> value in this forme -MM-dd'T'hh:mm:ss'Z'
>
> A regex in the _input_ side will have
> no effect on what Solr returns. You'd have
> to use a DocTransformer to change the output
> on the query side. DIH is in the indexing side.
>
> Best,
> Erick
>
> On Mon, Jul 18, 2016 at 2:45 AM, kostali hassan
> <med.has.kost...@gmail.com> wrote:
> > can we use transformer="RegexTransformer"
> > and set in db_data_config.xml
> >   > groupNames="date_t,time_t" />
> >
> > 2016-07-16 18:18 GMT+01:00 Shawn Heisey <apa...@elyograg.org>:
> >
> >> On 7/15/2016 3:10 PM, kostali hassan wrote:
> >> > Thank you Shawn the prb is when I display the field type date I get
> >> > the value in this forme -MM-dd'T'hh:mm:ss'Z'
> >>
> >> Solr only displays ISO date format for date fields -- an example is
> >> 2016-07-16T18:17:08.497Z -- and only in the UTC timezone.  If you want
> >> something else in your application, you'll have to translate it, or
> >> you'll have to write a custom plugin to add to Solr that changes the
> >> output format.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


More like this in solr5.4.1

2016-07-19 Thread kostali hassan
I want introdius Morelikethis to get simmilaire document for each query.
I had index rich data pds and msword I guess The fields to use for
similarity
is CONTENT used also for  highlighting document content.
In my case what is the best way to build mlt :MoreLikeThisHandler
 or with the
MoreLikeThisComponent in SearchHandler
.


Re: index sql databases

2016-07-18 Thread kostali hassan
can we use transformer="RegexTransformer"
and set in db_data_config.xml
 

2016-07-16 18:18 GMT+01:00 Shawn Heisey <apa...@elyograg.org>:

> On 7/15/2016 3:10 PM, kostali hassan wrote:
> > Thank you Shawn the prb is when I display the field type date I get
> > the value in this forme -MM-dd'T'hh:mm:ss'Z'
>
> Solr only displays ISO date format for date fields -- an example is
> 2016-07-16T18:17:08.497Z -- and only in the UTC timezone.  If you want
> something else in your application, you'll have to translate it, or
> you'll have to write a custom plugin to add to Solr that changes the
> output format.
>
> Thanks,
> Shawn
>
>


Re: index sql databases

2016-07-15 Thread kostali hassan
Thank you Shawn the prb is when I display the field type date I get the
value in this forme -MM-dd'T'hh:mm:ss'Z'


index sql databases

2016-07-15 Thread kostali hassan
I use solr5.4.1 when a attribute the type date is null (:00:00) the
processus of indexation stop and the log had an Error , how i have to
change in driver="com.mysql.jdbc.Driver" to ignore null date;
Last question how to set


to to field date and time< hh:mm:ss>


DIH:damaged files

2016-07-14 Thread kostali hassan
I try to index many files msword and pdf using solr-5.4.1 ;
In solr logg I get only the description of ERROR not the file who cause the
Error;
 how to get a list of files are corrupt and Tika cannot index them; AND
even if solr try index corrupt file and fail how force solr to continue
indexing the next file ,because in handler DIH of solr I wrote in
tika_data_config.xml onError="skip" or onError="continue" dont work because
the indexation stop when tika try index the first corrupt file.


Re: Fwd: how collect a list of damaged file they can not be indexed

2016-07-13 Thread kostali hassan
thank you Rick the section logging of solr Admin show only name of Error
and caused by

Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:417)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:481)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:462)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to read content Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:515)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
... 5 more
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@88ee82
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:159)
... 9 more
Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException:
Can't open the specified file:
'D:\solr-5.4.1\server\tmp\apache-tika-417176949707403825.tmp'
at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:112)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:225)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 12 more
Caused by: java.util.zip.ZipException: invalid END header (bad central
directory offset)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:220)
at java.util.zip.ZipFile.(ZipFile.java:150)
at java.util.zip.ZipFile.(ZipFile.java:164)
at 
org.apache.poi.openxml4j.util.ZipSecureFile.(ZipSecureFile.java:105)
at 
org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:175)
at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:110)


they are no trace or name of bad files who fail indexation


2016-06-24 12:39 GMT+00:00 Rick Leir <rl...@leirtech.com>:

> Do you mean that some of your pdf's are corrupt and Tika cannot index
> them? There should be some mention in the log file, so you can know which
> pdf is a problem. Fix it somehow and re-index.
>
>
> On June 22, 2016 9:44:01 PM EDT, kostali hassan <med.has.kost...@gmail.com>
> wrote:
>>
>> -- Message transféré --
>> De : "kostali hassan" <med.has.kost...@gmail.com>
>> Date : 22 juin 2016 14:00
>> Objet : how collect a list of damaged file they can not be indexed
>> À : <solr-user@lucene.apache.org>
>> Cc :
>>
>> I start solr 5.4.1 to indexe rich data pdf and msword using data import
>> handler.
>> the file tika-config.xml I wrote: onError="skip"
>>
>> I want recover corrupted file
>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>


Re: Update index

2016-07-13 Thread kostali hassan
I try to set deltaquery

> deltaImportQuery="SELECT * from users WHERE id='${dih.delta.id}'"
> deltaQuery="SELECT id FROM users  WHERE modified >
> '${dataimporter.last_index_time}'"

But the database who i try to index dont have attribute modified   just
date_creation

2016-07-13 14:11 GMT+01:00 Jamal, Sarfaraz <
sarfaraz.ja...@verizonwireless.com.invalid>:

> Hi Kostali,
>
> I would look at the Delta Queries -
>
> Sas
>
> -Original Message-
> From: kostali hassan [mailto:med.has.kost...@gmail.com]
> Sent: Wednesday, July 13, 2016 5:17 AM
> To: solr-user@lucene.apache.org
> Subject: Update index
>
> I am using solr 5.4 1 to index sql database with data import handler.
> I am looking for update index automatically when the database is modified
> or insert in it new value.
>


Update index

2016-07-13 Thread kostali hassan
I am using solr 5.4 1 to index sql database with data import handler.
I am looking for update index automatically when the database is modified
or insert in it new value.


Re: Searching Home's, Homes and Home

2016-07-12 Thread kostali hassan
Or you can build a file called synonym.txt in your directory config of your
core.
Le 11 juil. 2016 17:06, "Surender"  a écrit :

> Thanks...
>
> I am applying these filters and will share update on this issue. It will
> take couple of days.
>
> Thanks,
> Surender Singh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286579.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Fwd: how collect a list of damaged file they can not be indexed

2016-06-22 Thread kostali hassan
-- Message transféré --
De : "kostali hassan" <med.has.kost...@gmail.com>
Date : 22 juin 2016 14:00
Objet : how collect a list of damaged file they can not be indexed
À : <solr-user@lucene.apache.org>
Cc :

I start solr 5.4.1 to indexe rich data pdf and msword using data import
handler.
the file tika-config.xml I wrote: onError="skip"

I want recover corrupted file


how collect a list of damaged file they can not be indexed

2016-06-22 Thread kostali hassan
I start solr 5.4.1 to indexe rich data pdf and msword using data import
handler.
the file tika-config.xml I wrote: onError="skip"

I want recover corrupted file


solr5.4.1 : data import handler for index rich data

2016-06-06 Thread kostali hassan
I am looking to add new field to extract they value from the field text:



 for example the field links to extract all links from the field text of
each file.
I define in tika.config.xml a regex for the expression of links but when
the prossesor of indexation is finish I get just one value even if in
schema.xml I define the field links as multiValued (true) ; And I remark
the handler update/Extract get all the links automaticlly (multi value).
what I have to do to get all links present in each files with data import
handler.


data import handler for solr 5.4.1 to index rich Data

2016-06-02 Thread kostali hassan
I am looking for to define multi field for example the field links to
extract all links from the field text of each file.
I define in tika.config.xml a regex for the expression of links but when
the prossesor of indexation is finish I get just one value even if in
schema.xml I define the field links as multiValued (true) ; And I remark
the handler update/Extract get all the links automaticlly (multi value).
what I have to do to get all links present in each files with data import
handler.


Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread kostali hassan
I did it , I copied all my dynamic field into text field and it work great.
just one question even if I copied text into content and the inverse for
get highliting , thats not work ,they are another way to get highliting?
thank you eric

2016-05-26 18:28 GMT+01:00 Erick Erickson <erickerick...@gmail.com>:

> And, you can copy all of the fields into an "uber field" using the
> copyField directive and just search the "uber field".
>
> Best,
> Erick
>
> On Thu, May 26, 2016 at 7:35 AM, kostali hassan
> <med.has.kost...@gmail.com> wrote:
> > thank you it make sence .
> > have a good day
> >
> > 2016-05-26 15:31 GMT+01:00 Siddhartha Singh Sandhu <sandhus...@gmail.com
> >:
> >
> >> The schema.xml/managed_schema defines the default search field as
> `text`.
> >>
> >> You can make all fields that you want searchable type `text`.
> >>
> >> On Thu, May 26, 2016 at 10:23 AM, kostali hassan <
> >> med.has.kost...@gmail.com>
> >> wrote:
> >>
> >> > I import data from sql databases with DIH . I am looking for serch
> term
> >> in
> >> > all fields not by field.
> >> >
> >>
>


Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread kostali hassan
thank you it make sence .
have a good day

2016-05-26 15:31 GMT+01:00 Siddhartha Singh Sandhu <sandhus...@gmail.com>:

> The schema.xml/managed_schema defines the default search field as `text`.
>
> You can make all fields that you want searchable type `text`.
>
> On Thu, May 26, 2016 at 10:23 AM, kostali hassan <
> med.has.kost...@gmail.com>
> wrote:
>
> > I import data from sql databases with DIH . I am looking for serch term
> in
> > all fields not by field.
> >
>


"data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread kostali hassan
I import data from sql databases with DIH . I am looking for serch term in
all fields not by field.


Re: trying DIH but get 'Sorry, no dataimport-handler defined!'

2016-05-24 Thread kostali hassan
if you have in  this path server/solr/configsets/testdih/conf you shoud
right this in your line commande:
'bin\solr>solr create -c your_core -d testdih -p 8983 to create a core with
an exemple config testdih.

2016-05-24 9:35 GMT+01:00 scott.chu :

>
> I do following things:
>
> * I create folder : D:\solr-6.0.0\myconfigsets\testdih.
> * Copy D:\portable_sw\solr-6.0.0\example\example-DIH\solr\db\conf to
> D:\solr-6.0.0\myconfigsets\testdih.
> * Go into D:\solr-6.0.0\myconfigsets\testdih\conf and edit
> db-data-config.xml as follows (I am pretty sure mysql environment is ok):
>
>   
>url="jdbc:mysql://localhost:3306/test" user="hello" password="hellothere" />
>   
>   
>   
>   
>   
>   
>   
>   
>   
>
> * Then I copy mysql-connector-java-5.0.8-bin.jar to
> D:\portable_sw\solr-6.0.0\server\solr-webapp\webapp\WEB-INF\lib.
> * I check solrconfig.xml  and see these relevant lines:
>
>  regex="solr-dataimporthandler-.*\.jar" />
>   ...
>   ...
>   
>   
> db-data-config.xml
>   
> 
>
> * cd to  D:solr-6.0.0, issue 'bin\solr start', it starts ok.
> * Issue 'bin\solr create_core -c testdih -d myconfigsets\testdih\conf' to
> create a core. It's ok, too.
>
> * The solr.log has these log messages:
>
> 2016-05-24 15:59:24,781 INFO  (coreLoadExecutor-6-thread-1) [   ]
> o.a.s.c.SolrResourceLoader Adding
> 'file:/D:/portable_sw/solr-6.0.0/dist/solr-dataimporthandler-6.0.0.jar' to
> classloader
> 2016-05-24 15:59:24,781 INFO  (coreLoadExecutor-6-thread-1) [   ]
> o.a.s.c.SolrResourceLoader Adding
> 'file:/D:/portable_sw/solr-6.0.0/dist/solr-dataimporthandler-extras-6.0.0.jar'
> to classloader
>
> * So I think dih jars are loaded ok.
>
> I go to localhost:893 in browser and select core 'testdih', then click
> 'DataImport' item but rightpane shows "Sorry, no dataimport-handler
> defined!".
>
>  What do I miss?
>
>
> scott.chu,scott@udngroup.com
> 2016/5/24 (週二)
>


Re: Indexing docuements in Solr 5 Using Tika extraction error

2016-03-25 Thread kostali hassan
tank you shawn ; but if I use solarium client PHP for the production what I
have to do in this case.

2016-03-25 13:44 GMT+00:00 Shawn Heisey :

> On 3/25/2016 5:44 AM, Moncif Aidi wrote:
> > Im Using solr 5.4.1 for indexing thousands of documents, and it works
> > perfectly.The issue comes when some documents are not well formatted or
> > contains some special characters and it makes solr hangs or blocked on
> some
> > perticular documents and it gives these errors when viewing the log :
> > i want to detect what files are causing these problems, or at least point
> > me to some library Im missing. Thanks in advance
>
> Tika is known for problems like this, particularly with PDF and
> Microsoft Office documents.
>
> This is one of the hazards of running with the Tika application built
> into Solr's Extracting Request Handler.  You can't get any good
> information out of Solr about what went wrong, and any severe problems
> with Tika might actually cause Solr to completely crash.
>
> If you're going to use Tika for production indexing, you should write a
> Java program using SolrJ and Tika so that you are in complete control,
> and so Solr isn't unstable.
>
> Thanks,
> Shawn
>
>


indexing rich data using DIH from solr 5.4.1

2016-03-25 Thread kostali hassan
some document have content can not be extracted and stack in JVM of solr ;
i get this ERROR:

24/03/2016 à 19:26:59 ERROR null DocBuilder Exception while processing:
files document : null:org.apache.solr.handler.
 dataimport.DataImportHandlerException: Unable to read content Processing
Document # 1

Exception while processing: files document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to read content Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:515)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:417)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:481)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:462)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@2cc58e97
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:159)
... 9 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: -1
at java.lang.String.substring(Unknown Source)
at 
org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:407)
at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:256)
at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:196)
at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:105)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:201)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:172)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
... 12 more


DIH cant index adresses web

2016-03-22 Thread kostali hassan
I try to index rich data (msword and pdf) but when a content of document
have multiple liens (web adress) i get an ERROR in log .
what i have to add in my tika-config.xml to index web path .


Re: sorry, no dataimport-handler defined!

2016-02-03 Thread kostali hassan
in request data import handler for solrconfig.xml do :



  tika-data-config.xml

  

and define your file tika-data-config.xml and put this file in the
directory config from your core.

2016-02-02 17:35 GMT+00:00 Jean-Jacques Monot :

> Exact. Newbie user !
>
> OK i have seen what is missing ...
>
> Le 2 févr. 2016 15:40, "Davis, Daniel (NIH/NLM) [C]" 
> a écrit :
> >
> > It sounds a bit like you are just exploring Solr for the first time.
> To use the Data Import Handler, you need to create an XML file that
> configures it, data-config.xml by default.
> >
> > But before we go into details, what are you trying to accomplish with
> Solr?
> >
> > -Original Message-
> > From: Jean-Jacques MONOT [mailto:jj_mo...@yahoo.fr]
> > Sent: Monday, February 01, 2016 2:31 PM
> > To: solr-user@lucene.apache.org
> > Subject: Potential SPAM:sorry, no dataimport-handler defined!
> >
> > Hello
> >
> > I am using SOLR 5.4.1 and the graphical admin UI.
> >
> > I successfully created multiples cores and indexed various documents,
> using in line commands : (create -c) and (post.jar) on W10.
> >
> > But in the GUI, when I click on "Dataimport", I get the following
> message : "sorry, no dataimport-handler defined!"
> >
> > I get the same message even on 5.3.1 or for different cores.
> >
> > What is wrong ?
> >
> > JJM
> >
> > ---
> > L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> > https://www.avast.com/antivirus
> >
> >
>


Re: indexing rich data with solr 5.3.1 integreting in Ubuntu server

2016-01-26 Thread kostali hassan
they are loaded because solr is indexing .doc and .docx (msword) and fail
for pdf files .

2016-01-26 12:49 GMT+00:00 Emir Arnautovic <emir.arnauto...@sematext.com>:

> Hi,
> I would first check if external libraries are present and loaded. How do
> you start Solr? Try explicitly setting solr.install.dir or set absolute
> path to libs and see in logs if they are loaded.
>
>  regex=".*\.jar" />
>
>
> Thanks,
> Emir
>
> On 25.01.2016 15:16, kostali hassan wrote:
>
>> 0down votefavorite
>> <
>> http://stackoverflow.com/questions/34962280/solr-indexing-pdf-attachments-not-working-in-ubuntu#
>> >
>>
>>
>> I have a problem with integrating solr in Ubuntu server.Before using solr
>> on ubuntu server i tested it on my mac it was working perfectly for DIH
>> request handler and update/extract. it indexed my PDF,Doc,Docx
>> documents.so
>> after installing solr on ubuntu server and using the same configuration
>> files and librairies. i've found out that solr doesn't index PDf documents
>> and none Error and any exceptions in solr log.But i can search over .Doc
>> and .Docx documents.
>>
>> here some parts of my solrconfig.xml contents :
>>
>> > regex=".*\.jar" />
>>> regex="solr-cell-\d.*\.jar" />
>>
>> >startup="lazy"
>>class="solr.extraction.ExtractingRequestHandler" >
>>  
>>true
>>ignored_
>>_text_
>>  
>>
>>
>> DIH config:
>>
>> > class="org.apache.solr.handler.dataimport.DataImportHandler">
>> 
>> tika.config.xml
>> 
>> 
>>
>> tika.config.xml
>>
>> 
>>  
>>  
>>  > dataSource="null" rootEntity="false"
>>  baseDir="D:\Lucene\document"
>> fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)"
>> onError="skip"
>>  recursive="true">
>>  
>>  
>>  
>>   
>> >  name="documentImport"
>> dataSource="files"
>>  processor="TikaEntityProcessor"
>>  url="${files.fileAbsolutePath}"
>>  format="text">
>>
>>
>>  
>> > name="title" meta="true"/>
>>  
>>
>> > name="content"/>
>>  > name="LastModifiedBy" meta="true"/>
>>  
>>  
>>  
>> 
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


indexing rich data with solr 5.3.1 integreting in Ubuntu server

2016-01-25 Thread kostali hassan
0down votefavorite


I have a problem with integrating solr in Ubuntu server.Before using solr
on ubuntu server i tested it on my mac it was working perfectly for DIH
request handler and update/extract. it indexed my PDF,Doc,Docx documents.so
after installing solr on ubuntu server and using the same configuration
files and librairies. i've found out that solr doesn't index PDf documents
and none Error and any exceptions in solr log.But i can search over .Doc
and .Docx documents.

here some parts of my solrconfig.xml contents :


  



  true
  ignored_
  _text_

  

DIH config:



tika.config.xml



tika.config.xml








 
   














indexing rich data with solr 5.3.1 integreting in Ubuntu server

2016-01-23 Thread kostali hassan
0down votefavorite


I have a problem with integrating solr in Ubuntu server.Before using solr
on ubuntu server i tested it on my mac it was working perfectly for DIH
request handler and update/extract. it indexed my PDF,Doc,Docx documents.so
after installing solr on ubuntu server and using the same configuration
files and librairies. i've found out that solr doesn't index PDf documents
and none Error and any exceptions in solr log.But i can search over .Doc
and .Docx documents.

here some parts of my solrconfig.xml contents :


  



  true
  ignored_
  _text_

  

DIH config:



tika.config.xml



tika.config.xml








 
   














Re: indexing rich data with solr 5.3

2016-01-15 Thread kostali hassan
thank you Erik for your precious advice.

2016-01-14 17:24 GMT+00:00 Erik Hatcher <erik.hatc...@gmail.com>:

> And also, bin/post can be your friend when it comes to troubleshooting or
> introspecting Tika parsing via /update/extract.  Like this:
>
> $ bin/post -c test -params "extractOnly=true=ruby=yes" -out yes
> docs/SYSTEM_REQUIREMENTS.html
> java -classpath /Users/erikhatcher/solr-5.3.0/dist/solr-core-5.3.0.jar
> -Dauto=yes -Dparams=extractOnly=true=ruby=yes -Dout=yes -Dc=test
> -Ddata=files org.apache.solr.util.SimplePostTool
> /Users/erikhatcher/solr-5.3.0/docs/SYSTEM_REQUIREMENTS.html
> SimplePostTool version 5.0.0
> Posting files to [base] url
> http://localhost:8983/solr/test/update?extractOnly=true=ruby=yes.
> ..
> Entering auto mode. File endings considered are
> xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file SYSTEM_REQUIREMENTS.html (text/html) to [base]/extract
> {
>   'responseHeader'=>{
> 'status'=>0,
> 'QTime'=>3},
>   ''=>'
> http://www.w3.org/1999/xhtml;>
> 
> 
>- from
> https://lucidworks.com/blog/2015/08/04/solr-5-new-binpost-utility/
>
> But I also recommend having the Tika desktop app handy, in which you can
> drag and drop a file and see the gory details of how it parses the file.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com
>
>
>
> > On Jan 14, 2016, at 10:55 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> > No good way except to try them. For getting details on Tika parsing
> > failures, I much prefer the SolrJ process that the link I sent you
> > outlines.
> >
> > Best,
> > Erick
> >
> > On Thu, Jan 14, 2016 at 7:52 AM, kostali hassan
> > <med.has.kost...@gmail.com> wrote:
> >> thank you Eric I have prb with this files; last question how to define
> or
> >> get the list of files cant be indexing or bad files.
> >>
> >>
> >>>
> >>>
> >>>
> >>>
>
>


Fwd: indexing rich data with solr 5.3

2016-01-14 Thread kostali hassan
thank you Eric I have prb with this files; last question how to define or
get the list of files cant be indexing or bad files.


>
>
>
>


Re: indexing rich data with solr 5.3

2016-01-12 Thread kostali hassan
yes i'am indexing succeflly with DIH other files ;  now i try to index this
files with ExtractingRequestHandler i get this ERROR:

null:org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Error creating OOXML
extractor
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tika.exception.TikaException: Error creating
OOXML extractor
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:122)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
... 27 more
Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException:
Package should contain a content type part [M1.13]
at 
org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:203)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:673)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:73)


2016-01-12 1:23 GMT+00:00 Erick Erickson <erickerick...@gmail.com>:

> Looks like a bad file. Do you have any success using DIH on any files?
>
> What happens if you just send that particular file throug the
>  ExtractingRequestHandler?
>
> Best,
> Erick
>
> On Mon, Jan 11, 2016 at 3:51 PM, kostali hassan
> <med.has.kost...@gmail.com> wrote:
> > such files msword and pdf donsnt indexing using *dataimoprt i have this
> > error:*
> >
> > Full Import failed:java.lang.RuntimeException:
> > java.lang.RuntimeException:
> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
> > to read content Processing Document # 2
> > at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
> > at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
> > at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
> > at
>

indexing rich data with solr 5.3

2016-01-11 Thread kostali hassan
such files msword and pdf donsnt indexing using *dataimoprt i have this
error:*

Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 2
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 2
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to read content Processing Document # 2
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:168)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 5 more
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@188120
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:162)
... 9 more
Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException:
Can't open the specified file:
'D:\solr\solr-5.3.1\server\tmp\apache-tika-121920532070319073.tmp'
at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:112)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:224)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
... 12 more
Caused by: java.util.zip.ZipException: invalid END header (bad central
directory offset)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:220)
at java.util.zip.ZipFile.(ZipFile.java:150)
at java.util.zip.ZipFile.(ZipFile.java:164)
at 
org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:174)
at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:110)
... 16 more


Re: Re: secure solr 5.3.1

2015-12-10 Thread kostali hassan
Iam looking to secure my solr runing in standalone Mode within windows ;the
kerberose plugin is only able to secure solr in standalone mode. how create
principale and here password.

2015-12-10 9:35 GMT+00:00 kostali hassan <med.has.kost...@gmail.com>:

> Iam looking to secure my solr runing in standalone Mode the kerberose
> plugin is only able to secure solr in standalone mode.
>
> 2015-12-09 23:00 GMT+00:00 kostali hassan <med.has.kost...@gmail.com>:
>
>> -- Message transféré --
>> De : "Ishan Chattopadhyaya" <ichattopadhy...@gmail.com>
>> Date : 9 déc. 2015 19:39
>> Objet : Re: secure solr 5.3.1
>> À : <solr-user@lucene.apache.org>
>> Cc :
>>
>> I don't have much personal experience with setting up a kerberos server on
>> a Windows machine, but I remember things being painful when I tried and
>> failed once. If you have an option to use a VM, I suggest try setting up
>> the KDC in a GNU/Linux VM (through VirtualBox).
>> In that case, make sure the Windows host is able to access the ports of
>> the
>> guest machine. Better still, setup GNU/Linux VMs for Solr too.
>>
>> On Thu, Dec 10, 2015 at 12:39 AM, kostali hassan <
>> med.has.kost...@gmail.com>
>> wrote:
>>
>> > I install MIT Kerberos for Windows 4.0.1
>> >
>> > 2015-12-09 19:05 GMT+00:00 Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com
>> > >:
>> >
>> > > What exactly is your confusion? Do you have access to a KDC?
>> > >
>> > > Briefly:
>> > > Login to your KDC server, do a kadmin.local:
>> > > Then,
>> > > addprinc HTTP/192.168.0.107
>> > > ktadd -k /tmp/107.keytab HTTP/192.168.0.107
>> > >
>> > > Then copy the keytab file to your solr node to the appropriate places.
>> > >
>> > >
>> > > On Thu, Dec 10, 2015 at 12:08 AM, kostali hassan <
>> > > med.has.kost...@gmail.com>
>> > > wrote:
>> > >
>> > > > I folow this two resources and Iam stuck in
>> > > >
>> > > >- Create service principals and keytab files.
>> > > >
>> > > >
>> > > > 2015-12-09 18:06 GMT+00:00 Ishan Chattopadhyaya <
>> > > ichattopadhy...@gmail.com
>> > > > >:
>> > > >
>> > > > > The kerberos plugin is available for use with Solr out of the box.
>> > The
>> > > > two
>> > > > > resources which Bosco mentioned should get you up and running.
>> > > > >
>> > > > > On Wed, Dec 9, 2015 at 11:34 PM, Don Bosco Durai <
>> bo...@apache.org>
>> > > > wrote:
>> > > > >
>> > > > > > There are two resources available:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/RANGER/How+to+configure+Solr+Cloud+with+Kerberos+for+Ranger+0.5
>> > > > > >
>> > > > > >
>> > > > > > Bosco
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On 12/9/15, 3:14 AM, "kostali hassan" <
>> med.has.kost...@gmail.com>
>> > > > wrote:
>> > > > > >
>> > > > > > >how I setting up Solr to use Kerberos ? i have to dowload
>> kerberos
>> > > and
>> > > > > put
>> > > > > > >the plug-in implementation in the classpath(/server/solr).
>> > > > > > >
>> > > > > > >2015-12-08 22:19 GMT+00:00 Ishan Chattopadhyaya <
>> > > > > > ichattopadhy...@gmail.com>:
>> > > > > > >
>> > > > > > >> Right, as Bosco said, this has been tested well and
>> supported on
>> > > > > > SolrCloud.
>> > > > > > >> It should be possible to run it in standalone mode, but it is
>> > no

Re: Re: secure solr 5.3.1

2015-12-10 Thread kostali hassan
to setup kerberos in windows :

https://github.com/krb5/krb5/blob/master/src/windows/README

2015-12-10 12:10 GMT+00:00 Ishan Chattopadhyaya <ichattopadhy...@gmail.com>:

> 1. Please set up your Kerberos server. (KDC)
> 2. Create principals (usernames) and keytab files as mentioned in the
> document.
> 3. Copy the keytab files to your Solr machines
> 4. Mention all parameters in your bin/solr.in.sh file.
> 5. Start Solr using a separate parameter,
> "-DauthenticationPlugin=org.apache.solr.security.KerberosPlugin". You can
> put this parameter also in your bin/solr.in.sh file.
>
> All the best,
> Regards,
> Ishan
>
>
> On Thu, Dec 10, 2015 at 5:19 PM, kostali hassan <med.has.kost...@gmail.com
> >
> wrote:
>
> > Iam looking to secure my solr runing in standalone Mode within windows
> ;the
> > kerberose plugin is only able to secure solr in standalone mode. how
> create
> > principale and here password.
> >
> > 2015-12-10 9:35 GMT+00:00 kostali hassan <med.has.kost...@gmail.com>:
> >
> > > Iam looking to secure my solr runing in standalone Mode the kerberose
> > > plugin is only able to secure solr in standalone mode.
> > >
> > > 2015-12-09 23:00 GMT+00:00 kostali hassan <med.has.kost...@gmail.com>:
> > >
> > >> -- Message transféré --
> > >> De : "Ishan Chattopadhyaya" <ichattopadhy...@gmail.com>
> > >> Date : 9 déc. 2015 19:39
> > >> Objet : Re: secure solr 5.3.1
> > >> À : <solr-user@lucene.apache.org>
> > >> Cc :
> > >>
> > >> I don't have much personal experience with setting up a kerberos
> server
> > on
> > >> a Windows machine, but I remember things being painful when I tried
> and
> > >> failed once. If you have an option to use a VM, I suggest try setting
> up
> > >> the KDC in a GNU/Linux VM (through VirtualBox).
> > >> In that case, make sure the Windows host is able to access the ports
> of
> > >> the
> > >> guest machine. Better still, setup GNU/Linux VMs for Solr too.
> > >>
> > >> On Thu, Dec 10, 2015 at 12:39 AM, kostali hassan <
> > >> med.has.kost...@gmail.com>
> > >> wrote:
> > >>
> > >> > I install MIT Kerberos for Windows 4.0.1
> > >> >
> > >> > 2015-12-09 19:05 GMT+00:00 Ishan Chattopadhyaya <
> > >> ichattopadhy...@gmail.com
> > >> > >:
> > >> >
> > >> > > What exactly is your confusion? Do you have access to a KDC?
> > >> > >
> > >> > > Briefly:
> > >> > > Login to your KDC server, do a kadmin.local:
> > >> > > Then,
> > >> > > addprinc HTTP/192.168.0.107
> > >> > > ktadd -k /tmp/107.keytab HTTP/192.168.0.107
> > >> > >
> > >> > > Then copy the keytab file to your solr node to the appropriate
> > places.
> > >> > >
> > >> > >
> > >> > > On Thu, Dec 10, 2015 at 12:08 AM, kostali hassan <
> > >> > > med.has.kost...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > I folow this two resources and Iam stuck in
> > >> > > >
> > >> > > >- Create service principals and keytab files.
> > >> > > >
> > >> > > >
> > >> > > > 2015-12-09 18:06 GMT+00:00 Ishan Chattopadhyaya <
> > >> > > ichattopadhy...@gmail.com
> > >> > > > >:
> > >> > > >
> > >> > > > > The kerberos plugin is available for use with Solr out of the
> > box.
> > >> > The
> > >> > > > two
> > >> > > > > resources which Bosco mentioned should get you up and running.
> > >> > > > >
> > >> > > > > On Wed, Dec 9, 2015 at 11:34 PM, Don Bosco Durai <
> > >> bo...@apache.org>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > > There are two resources available:
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/solr/

Re: secure solr 5.3.1

2015-12-09 Thread kostali hassan
how I setting up Solr to use Kerberos ? i have to dowload kerberos and put
the plug-in implementation in the classpath(/server/solr).

2015-12-08 22:19 GMT+00:00 Ishan Chattopadhyaya <ichattopadhy...@gmail.com>:

> Right, as Bosco said, this has been tested well and supported on SolrCloud.
> It should be possible to run it in standalone mode, but it is not something
> that has been well test yet.
>
> On Tue, Dec 8, 2015 at 11:02 PM, Don Bosco Durai <bo...@apache.org> wrote:
>
> > It was tested and meant to work only in SolrCloud mode.
> >
> >
> >
> >
> >
> >
> > On Tue, Dec 8, 2015 at 9:30 AM -0800, "kostali hassan" <
> > med.has.kost...@gmail.com> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >- Kerberos authentication
> >:
> >work in SolrCloud or standalone mode but the documentation is not
> clear
> >-
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin?focusedCommentId=61331746#comment-61331746
> >
> >
> > 2015-12-08 17:14 GMT+00:00 Don Bosco Durai :
> >
> > > Not sure exactly what you mean here. Even if you are running in
> > SolrCloud,
> > > you can access it using URL. So there won't be any change on the client
> > > side.
> > > Bosco
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Dec 8, 2015 at 2:03 AM -0800, "kostali hassan" <
> > > med.has.kost...@gmail.com> wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > if I run solr in SolrCloud mode , my web hosting shoud be Cloud web
> > > hosting? or dont need a web server having cloud..?
> > >
> > > 2015-12-08 1:58 GMT+00:00 Don Bosco Durai :
> > >
> > > > Have you considered running your Solr as SolrCloud with embedded
> > > zookeeper?
> > > >
> > > > If you do, you have multiple options. Basic Auth, Kerberos and
> > > > authorization support.
> > > >
> > > >
> > > > Bosco
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On 12/7/15, 7:03 AM, "kostali hassan"  wrote:
> > > >
> > > > >How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I
> Am
> > > > >searching for the best way to secure my server solr but I found only
> > for
> > > > >cloud mode.
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
>


Re: secure solr 5.3.1

2015-12-09 Thread kostali hassan
I install MIT Kerberos for Windows 4.0.1

2015-12-09 19:05 GMT+00:00 Ishan Chattopadhyaya <ichattopadhy...@gmail.com>:

> What exactly is your confusion? Do you have access to a KDC?
>
> Briefly:
> Login to your KDC server, do a kadmin.local:
> Then,
> addprinc HTTP/192.168.0.107
> ktadd -k /tmp/107.keytab HTTP/192.168.0.107
>
> Then copy the keytab file to your solr node to the appropriate places.
>
>
> On Thu, Dec 10, 2015 at 12:08 AM, kostali hassan <
> med.has.kost...@gmail.com>
> wrote:
>
> > I folow this two resources and Iam stuck in
> >
> >- Create service principals and keytab files.
> >
> >
> > 2015-12-09 18:06 GMT+00:00 Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com
> > >:
> >
> > > The kerberos plugin is available for use with Solr out of the box. The
> > two
> > > resources which Bosco mentioned should get you up and running.
> > >
> > > On Wed, Dec 9, 2015 at 11:34 PM, Don Bosco Durai <bo...@apache.org>
> > wrote:
> > >
> > > > There are two resources available:
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin
> > > >
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/RANGER/How+to+configure+Solr+Cloud+with+Kerberos+for+Ranger+0.5
> > > >
> > > >
> > > > Bosco
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On 12/9/15, 3:14 AM, "kostali hassan" <med.has.kost...@gmail.com>
> > wrote:
> > > >
> > > > >how I setting up Solr to use Kerberos ? i have to dowload kerberos
> and
> > > put
> > > > >the plug-in implementation in the classpath(/server/solr).
> > > > >
> > > > >2015-12-08 22:19 GMT+00:00 Ishan Chattopadhyaya <
> > > > ichattopadhy...@gmail.com>:
> > > > >
> > > > >> Right, as Bosco said, this has been tested well and supported on
> > > > SolrCloud.
> > > > >> It should be possible to run it in standalone mode, but it is not
> > > > something
> > > > >> that has been well test yet.
> > > > >>
> > > > >> On Tue, Dec 8, 2015 at 11:02 PM, Don Bosco Durai <
> bo...@apache.org>
> > > > wrote:
> > > > >>
> > > > >> > It was tested and meant to work only in SolrCloud mode.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Dec 8, 2015 at 9:30 AM -0800, "kostali hassan" <
> > > > >> > med.has.kost...@gmail.com> wrote:
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >- Kerberos authentication
> > > > >> >:
> > > > >> >work in SolrCloud or standalone mode but the documentation is
> > not
> > > > >> clear
> > > > >> >-
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin?focusedCommentId=61331746#comment-61331746
> > > > >> >
> > > > >> >
> > > > >> > 2015-12-08 17:14 GMT+00:00 Don Bosco Durai :
> > > > >> >
> > > > >> > > Not sure exactly what you mean here. Even if you are running
> in
> > > > >> > SolrCloud,
> > > > >> > > you can access it using URL. So there won't be any change on
> the
> > > > client
> > > > >> > > side.
> > > > >> > > Bosco
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Tue, Dec 8, 2015 at 2:03 AM -0800, "kostali hassan" <
> > > > >> > > med.has.kost...@gmail.com> wrote:
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > if I run solr in SolrCloud mode , my web hosting shoud be
> Cloud
> > > web
> > > > >> > > hosting? or dont need a web server having cloud..?
> > > > >> > >
> > > > >> > > 2015-12-08 1:58 GMT+00:00 Don Bosco Durai :
> > > > >> > >
> > > > >> > > > Have you considered running your Solr as SolrCloud with
> > embedded
> > > > >> > > zookeeper?
> > > > >> > > >
> > > > >> > > > If you do, you have multiple options. Basic Auth, Kerberos
> and
> > > > >> > > > authorization support.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Bosco
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On 12/7/15, 7:03 AM, "kostali hassan"  wrote:
> > > > >> > > >
> > > > >> > > > >How I shoud secure my server of solr 5 .3.1 in  single-node
> > > > Mode. I
> > > > >> Am
> > > > >> > > > >searching for the best way to secure my server solr but I
> > found
> > > > only
> > > > >> > for
> > > > >> > > > >cloud mode.
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > > >
> > >
> >
>


kerberos and solr5 Service Principals and Keytab Files

2015-12-09 Thread kostali hassan
I am trying to secure solr using kerberos plugin , I want test kerberos in
localhost but i dont know how create kerberos principal At the KDC
server.and where generate keytab file from the KDC server’s /tmp/107.keytab.


Re: secure solr 5.3.1

2015-12-09 Thread kostali hassan
I folow this two resources and Iam stuck in

   - Create service principals and keytab files.


2015-12-09 18:06 GMT+00:00 Ishan Chattopadhyaya <ichattopadhy...@gmail.com>:

> The kerberos plugin is available for use with Solr out of the box. The two
> resources which Bosco mentioned should get you up and running.
>
> On Wed, Dec 9, 2015 at 11:34 PM, Don Bosco Durai <bo...@apache.org> wrote:
>
> > There are two resources available:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/display/RANGER/How+to+configure+Solr+Cloud+with+Kerberos+for+Ranger+0.5
> >
> >
> > Bosco
> >
> >
> >
> >
> >
> > On 12/9/15, 3:14 AM, "kostali hassan" <med.has.kost...@gmail.com> wrote:
> >
> > >how I setting up Solr to use Kerberos ? i have to dowload kerberos and
> put
> > >the plug-in implementation in the classpath(/server/solr).
> > >
> > >2015-12-08 22:19 GMT+00:00 Ishan Chattopadhyaya <
> > ichattopadhy...@gmail.com>:
> > >
> > >> Right, as Bosco said, this has been tested well and supported on
> > SolrCloud.
> > >> It should be possible to run it in standalone mode, but it is not
> > something
> > >> that has been well test yet.
> > >>
> > >> On Tue, Dec 8, 2015 at 11:02 PM, Don Bosco Durai <bo...@apache.org>
> > wrote:
> > >>
> > >> > It was tested and meant to work only in SolrCloud mode.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Dec 8, 2015 at 9:30 AM -0800, "kostali hassan" <
> > >> > med.has.kost...@gmail.com> wrote:
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >- Kerberos authentication
> > >> >:
> > >> >work in SolrCloud or standalone mode but the documentation is not
> > >> clear
> > >> >-
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin?focusedCommentId=61331746#comment-61331746
> > >> >
> > >> >
> > >> > 2015-12-08 17:14 GMT+00:00 Don Bosco Durai :
> > >> >
> > >> > > Not sure exactly what you mean here. Even if you are running in
> > >> > SolrCloud,
> > >> > > you can access it using URL. So there won't be any change on the
> > client
> > >> > > side.
> > >> > > Bosco
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Tue, Dec 8, 2015 at 2:03 AM -0800, "kostali hassan" <
> > >> > > med.has.kost...@gmail.com> wrote:
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > if I run solr in SolrCloud mode , my web hosting shoud be Cloud
> web
> > >> > > hosting? or dont need a web server having cloud..?
> > >> > >
> > >> > > 2015-12-08 1:58 GMT+00:00 Don Bosco Durai :
> > >> > >
> > >> > > > Have you considered running your Solr as SolrCloud with embedded
> > >> > > zookeeper?
> > >> > > >
> > >> > > > If you do, you have multiple options. Basic Auth, Kerberos and
> > >> > > > authorization support.
> > >> > > >
> > >> > > >
> > >> > > > Bosco
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On 12/7/15, 7:03 AM, "kostali hassan"  wrote:
> > >> > > >
> > >> > > > >How I shoud secure my server of solr 5 .3.1 in  single-node
> > Mode. I
> > >> Am
> > >> > > > >searching for the best way to secure my server solr but I found
> > only
> > >> > for
> > >> > > > >cloud mode.
> > >> > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> >
> >
>


Re: secure solr 5.3.1

2015-12-08 Thread kostali hassan
if I run solr in SolrCloud mode , my web hosting shoud be Cloud web
hosting? or dont need a web server having cloud..?

2015-12-08 1:58 GMT+00:00 Don Bosco Durai <bo...@apache.org>:

> Have you considered running your Solr as SolrCloud with embedded zookeeper?
>
> If you do, you have multiple options. Basic Auth, Kerberos and
> authorization support.
>
>
> Bosco
>
>
>
>
>
> On 12/7/15, 7:03 AM, "kostali hassan" <med.has.kost...@gmail.com> wrote:
>
> >How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
> >searching for the best way to secure my server solr but I found only for
> >cloud mode.
>
>


Re: secure solr 5.3.1

2015-12-08 Thread kostali hassan
   - Kerberos authentication
   
<https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin>:
   work in SolrCloud or standalone mode but the documentation is not clear
   -
   
https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin?focusedCommentId=61331746#comment-61331746


2015-12-08 17:14 GMT+00:00 Don Bosco Durai <bo...@apache.org>:

> Not sure exactly what you mean here. Even if you are running in SolrCloud,
> you can access it using URL. So there won't be any change on the client
> side.
> Bosco
>
>
>
>
>
>
> On Tue, Dec 8, 2015 at 2:03 AM -0800, "kostali hassan" <
> med.has.kost...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
>
> if I run solr in SolrCloud mode , my web hosting shoud be Cloud web
> hosting? or dont need a web server having cloud..?
>
> 2015-12-08 1:58 GMT+00:00 Don Bosco Durai :
>
> > Have you considered running your Solr as SolrCloud with embedded
> zookeeper?
> >
> > If you do, you have multiple options. Basic Auth, Kerberos and
> > authorization support.
> >
> >
> > Bosco
> >
> >
> >
> >
> >
> > On 12/7/15, 7:03 AM, "kostali hassan"  wrote:
> >
> > >How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
> > >searching for the best way to secure my server solr but I found only for
> > >cloud mode.
> >
> >
>
>
>
>
>
>


secure solr 5.3.1

2015-12-07 Thread kostali hassan
How I shoud secure my server of solr 5 .3.1 in  single-node Mode. I Am
searching for the best way to secure my server solr but I found only for
cloud mode.


Re: schema fileds and Typefield in solr-5.3.1

2015-12-05 Thread kostali hassan
I fixe the prb using requestHandler dataimoprt:



tika-data-config.xml



I configure the tika-data-config.xml according to my needs to get the right
value :







 

now dont need indexing  from Commandline using simpleposttool just go to  to
the web admin for dataimport and try and execute a full import.

2015-12-04 17:05 GMT+00:00 kostali hassan <med.has.kost...@gmail.com>:

> thank you , that's why I choose to add the exact value using solarium PHP
> Client, but the time out stop indexing after 30seconde:
>
> $dir = new Folder($dossier);
> $files = $dir->find('.*\.*');
> foreach ($files as $file) {
> $file = new File($dir->pwd() . DS . $file);
>
> $query = $client->createExtract();
> $query->setFile($file->pwd());
> $query->setCommit(true);
> $query->setOmitHeader(false);
>
> $doc = $query->createDocument();
> $doc->id =$file->pwd();
> $doc->name = $file->name;
> $doc->title = $file->name();
>
> $query->setDocument($doc);
>
> 2015-12-04 16:50 GMT+00:00 Erik Hatcher <erik.hatc...@gmail.com>:
>
>> Kostali -
>>
>> See if the "Introspect rich document parsing and extraction” section of
>> http://lucidworks.com/blog/2015/08/04/solr-5-new-binpost-utility/
>> helps*.  You’ll be able to see the output of /update/extract (aka Tika) and
>> adjust your mappings and configurations accordingly.
>>
>> * And apologies that bin/post isn’t Windows savvy at this point, but
>> you’ve got the hang of the Windows-compatible command-line it looks like.
>>
>> —
>> Erik Hatcher, Senior Solutions Architect
>> http://www.lucidworks.com
>>
>>
>>
>> > On Dec 4, 2015, at 11:44 AM, kostali hassan <med.has.kost...@gmail.com>
>> wrote:
>> >
>> > thank you Erick, i follow you advice and take a look to config apache
>> tika,
>> > I have modifie my request handler /update/extract:
>> >
>> > > >  startup="lazy"
>> >  class="solr.extraction.ExtractingRequestHandler" >
>> >
>> >  last_modified
>> >  ignored_
>> >
>> >  
>> >  true
>> >  links
>> >  ignored_
>> >
>> > > >
>> name="tika.config">D:\solr\solr-5.3.1\server\solr\tika-data-config.xml
>> >  
>> >
>> > and config tika :
>> >
>> > dataConfig>
>> >
>> >
>> >> > dataSource="null" rootEntity="false"
>> >baseDir="D:\Lucene\document"
>> > fileName=".*.(doc)|(pdf)|(docx)"
>> > onError="skip"
>> >recursive="true">
>> >
>> >
>> >
>> >
>> >   > >name="documentImport"
>> >processor="TikaEntityProcessor"
>> >url="${files.fileAbsolutePath}"
>> >format="text">
>> >
>> >
>> >
>> > 
>> >
>> >> > meta="true"/>
>> >> > meta="true"/>
>> >
>> >
>> >
>> > 
>> >
>> > and schema.xml:
>> >
>> > 
>> >
>> >
>> >
>> > but the prb is the same title of indexed files is wrong for msword
>>
>>
>


Re: schema fileds and Typefield in solr-5.3.1

2015-12-04 Thread kostali hassan
thank you Erick, i follow you advice and take a look to config apache tika,
I have modifie my request handler /update/extract:



  last_modified
  ignored_

  
  true
  links
  ignored_

D:\solr\solr-5.3.1\server\solr\tika-data-config.xml
  

and config tika :

dataConfig>







   












and schema.xml:





but the prb is the same title of indexed files is wrong for msword


Re: schema fileds and Typefield in solr-5.3.1

2015-12-04 Thread kostali hassan
thank you , that's why I choose to add the exact value using solarium PHP
Client, but the time out stop indexing after 30seconde:

$dir = new Folder($dossier);
$files = $dir->find('.*\.*');
foreach ($files as $file) {
$file = new File($dir->pwd() . DS . $file);

$query = $client->createExtract();
$query->setFile($file->pwd());
$query->setCommit(true);
$query->setOmitHeader(false);

$doc = $query->createDocument();
$doc->id =$file->pwd();
$doc->name = $file->name;
$doc->title = $file->name();

$query->setDocument($doc);

2015-12-04 16:50 GMT+00:00 Erik Hatcher <erik.hatc...@gmail.com>:

> Kostali -
>
> See if the "Introspect rich document parsing and extraction” section of
> http://lucidworks.com/blog/2015/08/04/solr-5-new-binpost-utility/
> helps*.  You’ll be able to see the output of /update/extract (aka Tika) and
> adjust your mappings and configurations accordingly.
>
> * And apologies that bin/post isn’t Windows savvy at this point, but
> you’ve got the hang of the Windows-compatible command-line it looks like.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com
>
>
>
> > On Dec 4, 2015, at 11:44 AM, kostali hassan <med.has.kost...@gmail.com>
> wrote:
> >
> > thank you Erick, i follow you advice and take a look to config apache
> tika,
> > I have modifie my request handler /update/extract:
> >
> >  >  startup="lazy"
> >  class="solr.extraction.ExtractingRequestHandler" >
> >
> >  last_modified
> >  ignored_
> >
> >  
> >  true
> >  links
> >  ignored_
> >
> >  >
> name="tika.config">D:\solr\solr-5.3.1\server\solr\tika-data-config.xml
> >  
> >
> > and config tika :
> >
> > dataConfig>
> >
> >
> > > dataSource="null" rootEntity="false"
> >baseDir="D:\Lucene\document"
> > fileName=".*.(doc)|(pdf)|(docx)"
> > onError="skip"
> >recursive="true">
> >
> >
> >
> >
> >>name="documentImport"
> >processor="TikaEntityProcessor"
> >url="${files.fileAbsolutePath}"
> >format="text">
> >
> >
> >
> > 
> >
> > > meta="true"/>
> > > meta="true"/>
> >
> >
> >
> > 
> >
> > and schema.xml:
> >
> > 
> >
> >
> >
> > but the prb is the same title of indexed files is wrong for msword
>
>


schema fileds and Typefield in solr-5.3.1

2015-12-03 Thread kostali hassan
I start working in solr 5x by extract solr in D://solr and run solr server
with :

D:\solr\solr-5.3.1\bin>solr start ;

Then I create a core in standalone mode :

D:\solr\solr-5.3.1\bin>solr create -c mycore

I need indexing from system files (word and pdf) and the schema API don’t
have a field “name” of document, then I Add this field using curl :

curl -X POST -H 'Content-type:application/json' --data-binary '{

  "add-field":{

 "name":"name",

 "type":"text_general",

 "stored":true,

 “indexed”:true }

}' http://localhost:8983/solr/mycore/schema



And re-index all document.with windows SimplepostTools:

D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes
-Dc=mycore -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool
D:\Lucene\document ;



But even if the field “name” is succeffly added he is empty ; the field
title get the name for only pdf document not for msword(.doc and .docx).



Then I choose indexing with techproducts example because he don’t use
schema.xml API then I can modified my schema:



D:\solr\solr-5.3.1>solr –e techproducts



Techproducts return the name of all files.xml indexed;



Then I create a new core based in solr_home example/techproducts/solr and I
use schema.xml (contient field “name”) and solrConfig.xml from techproducts
in this new core called demo.

When I indexed all document the field name exist but still empty for all
document indexed.



My question is how I can get just the name of each document(msword and pdf)
not the path like the field “id” or field “ressource_name” ; I have to
create new Typefield or exist another way.



Sorry for my basic English.

Thank you.


Re: curl adapter in solarium 3x

2015-12-03 Thread kostali hassan
Thank you Gora , in fact Curl is default adapter for solarium-3x and I am
not using zend framwork.

2015-12-03 11:05 GMT+00:00 Gora Mohanty <g...@mimirtech.com>:

> On 3 December 2015 at 16:20, kostali hassan <med.has.kost...@gmail.com>
> wrote:
> > How to force the connection to explicitly close when it has finished
> > processing, and not be pooled for reuse.
> > they are a way to tell to  server may send a keep-alive timeout (with
> > default Apache install, it is 15 seconds or 100 requests, whichever comes
> > first) - but cURL will just open another connection when that happens.
>
> These questions seem no longer relevant to the Solr mailing list.
> Please ask on a Solarium mailing list.
>
> In response to your earlier message,  I had sent you a link to the
> Solarium ZendHttpAdapter which seems to allow keepalive, unlike the
> curl adapter. Here it is again:
> http://wiki.solarium-project.org/index.php/V1:Client_adapters . You
> might also find this useful:
> http://framework.zend.com/manual/1.12/en/zend.http.client.advanced.html
>
> Regards,
> Gora
>


curl adapter in solarium 3x

2015-12-03 Thread kostali hassan
How to force the connection to explicitly close when it has finished
processing, and not be pooled for reuse.
they are a way to tell to  server may send a keep-alive timeout (with
default Apache install, it is 15 seconds or 100 requests, whichever comes
first) - but cURL will just open another connection when that happens.

this is my function's cakephp to index rich data from files system


*App::import('Vendor','autoload',array('file'=>'solarium/vendor/autoload.php'));*

*public function indexDocument(){*
*$config = array(*
* "endpoint" => array("localhost" => array("host"=>"127.0.0.1",*
* "port"=>"8983", "path"=>"/solr", "core"=>"demo",)*
*) );*
*   $start = microtime(true);*

*if($_POST){*
*// create a client instance*
*$client = new Solarium\Client($config);*
*$dossier=$this->request->data['User']['dossier'];*
*$dir = new Folder($dossier);*
*$files = $dir->find('.*\.*');*

* $headers = array('Content-Type:multipart/form-data');*

*foreach ($files as $file) {*
*$file = new File($dir->pwd() . DS . $file);*

*$query = $client->createExtract();*
*$query->setFile($file->pwd());*
*$query->setCommit(true);*
*$query->setOmitHeader(false);*

*$doc = $query->createDocument();*
*$doc->id =$file->pwd();*
*$doc->name = $file->name;*
*$doc->title = $file->name();*

*$query->setDocument($doc);*

*$request = $client->createRequest($query);*
*$request->addHeaders($headers);*

*$result = $client->executeRequest($request);*
*}*

*}*

*$this->set(compact('start'));*
*}*


Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
the prob with posting using line commande is :

I start working in solr 5.3.1 by extract solr in D://solr and run solr
server with :

D:\solr\solr-5.3.1\bin>solr start ;

Then I create a core in standalone mode :

D:\solr\solr-5.3.1\bin>solr create -c mycore

I need indexing from system files (word and pdf) and the schema API don’t
have a field “name” of document, then I Add this field using curl :

curl -X POST -H 'Content-type:application/json' --data-binary '{

  "add-field":{

 "name":"name",

 "type":"text_general",

 "stored":true,

 “indexed”:true }

}' http://localhost:8983/solr/mycore/schema



And re-index all document.with windows SimplepostTools:

D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes
-Dc=mycore -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool
D:\Lucene\document ;



But even if the field “name” is succeffly added he is empty ; the field
title get the name for only pdf document not for msword(.doc and .docx).



Then I choose indexing with techproducts example because he don’t use
schema.xml API then I can modified my schema:



D:\solr\solr-5.3.1>solr –e techproducts



Techproducts return the name of all files.xml indexed;



Then I create a new core based in solr_home example/techproducts/solr and I
use schema.xml (contient field “name”) and solrConfig.xml from techproducts
in this new core called demo.

When I indexed all document the field name exist but still empty for all
document indexed.



My question is how I can get just the name of each document(msword and pdf)
not the path like the field “id” or field “ressource_name” ; I have to
create new Typefield or exist another way.

2015-12-02 16:25 GMT+00:00 kostali hassan <med.has.kost...@gmail.com>:

> yes they are a Error in my solr logs:
> SolrException URLDecoder: Invalid character encoding detected after
> position 79 of query string / form data (while parsing as UTF-8)
> <http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79>
> this is my post in stack overflow :
>
> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
>
> 2015-12-02 16:18 GMT+00:00 Gora Mohanty <g...@mimirtech.com>:
>
>> On 2 December 2015 at 17:16, kostali hassan <med.has.kost...@gmail.com>
>> wrote:
>> > yes its logic Thank you , but i want understand why the same data is
>> > indexing fine in shell using windows SimplePostTool :
>> >>
>> >> D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar
>> -Dauto=yes
>> >> -Dc=solr_docs_core -Ddata=files -Drecursive=yes
>> >> org.apache.solr.util.SimplePostTool D:\Lucene\document ;
>>
>> That seems strange. Are you sure that you are posting the same PDF.
>> With SimplePostTool, you should be POSTing to the URL
>> /solr/update/extract?literal.id=myid , i.e., you need an option of
>> something like:
>> -Durl=http://localhost:8983/solr/update/extract?literal.id=myid in the
>> command line for SimplePostTool.
>>
>> Likewise, I am not that familiar with Solarium. Are you sure that the
>> file is being POSTed to /solr/update/extract . Are you seeing any
>> errors in your Solr logs?
>>
>> Regards,
>> Gora
>>
>
>


Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
yes they are a Error in my solr logs:
SolrException URLDecoder: Invalid character encoding detected after
position 79 of query string / form data (while parsing as UTF-8)
<http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79>
this is my post in stack overflow :
http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79

2015-12-02 16:18 GMT+00:00 Gora Mohanty <g...@mimirtech.com>:

> On 2 December 2015 at 17:16, kostali hassan <med.has.kost...@gmail.com>
> wrote:
> > yes its logic Thank you , but i want understand why the same data is
> > indexing fine in shell using windows SimplePostTool :
> >>
> >> D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar
> -Dauto=yes
> >> -Dc=solr_docs_core -Ddata=files -Drecursive=yes
> >> org.apache.solr.util.SimplePostTool D:\Lucene\document ;
>
> That seems strange. Are you sure that you are posting the same PDF.
> With SimplePostTool, you should be POSTing to the URL
> /solr/update/extract?literal.id=myid , i.e., you need an option of
> something like:
> -Durl=http://localhost:8983/solr/update/extract?literal.id=myid in the
> command line for SimplePostTool.
>
> Likewise, I am not that familiar with Solarium. Are you sure that the
> file is being POSTed to /solr/update/extract . Are you seeing any
> errors in your Solr logs?
>
> Regards,
> Gora
>


Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
i fixed but he still a smal prb from time out 30sc of wamp server then i
can just put 130files to a directory to index untill i index all my files :
this is my function idex document:

*App::import('Vendor','autoload',array('file'=>'solarium/vendor/autoload.php'));*

*public function indexDocument(){*
*$config = array(*
* "endpoint" => array("localhost" => array("host"=>"127.0.0.1",*
* "port"=>"8983", "path"=>"/solr", "core"=>"demo",)*
*) );*
*   $start = microtime(true);*

*if($_POST){*
*// create a client instance*
*$client = new Solarium\Client($config);*
*$dossier=$this->request->data['User']['dossier'];*
*$dir = new Folder($dossier);*
*$files = $dir->find('.*\.*');*

* $headers = array('Content-Type:multipart/form-data');*

*foreach ($files as $file) {*
*$file = new File($dir->pwd() . DS . $file);*

*$query = $client->createExtract();*
*$query->setFile($file->pwd());*
*$query->setCommit(true);*
*$query->setOmitHeader(false);*

*$doc = $query->createDocument();*
*$doc->id =$file->pwd();*
*$doc->name = $file->name;*
*$doc->title = $file->name();*

*$query->setDocument($doc);*

*$request = $client->createRequest($query);*
*$request->addHeaders($headers);*

*$result = $client->executeRequest($request);*
*}*

*}*

*$this->set(compact('start'));*
*}*


2015-12-02 16:42 GMT+00:00 kostali hassan <med.has.kost...@gmail.com>:

> yes I am sure because i successeflly Post the same document(455 .doc .docx
> and pdf in 18 second) with SimplePostTool
> But now i want to commincate directly with my server solr using solarium
> in my application cakephp ; I think only way to have the right encoding is
> in header :
> *$headers = array('Content-Type:multipart/form-data');*
> * I guess it will *working if the time of indexing is not depassing 30
> second from time out of wamp server.
>
> 2015-12-02 16:32 GMT+00:00 Gora Mohanty <g...@mimirtech.com>:
>
>> On 2 December 2015 at 21:55, kostali hassan <med.has.kost...@gmail.com>
>> wrote:
>> > yes they are a Error in my solr logs:
>> > SolrException URLDecoder: Invalid character encoding detected after
>> > position 79 of query string / form data (while parsing as UTF-8)
>> > <
>> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
>> >
>> > this is my post in stack overflow :
>> >
>> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
>>
>> Looks like an encoding error all right. Are you very sure that you can
>> sucessfully POST the same document with SimplePostTool. If so, I would
>> guess that you are not using Solarium correctly, i.e., the PDF file is
>> getting POSTed such that Solr is getting the raw content rather than
>> the extracted content.
>>
>> Regards,
>> Gora
>>
>
>


Solr extract performance

2015-12-02 Thread kostali hassan
I look for optimal way to extract and commit rich data from directory
contient many file system masword and pdf because I have a prb with
30second of time out in wamp server.
 this is my function index document in cakephp using solarium:

*App::import('Vendor','autoload',array('file'=>'solarium/vendor/autoload.php'));*

*public function indexDocument(){*
*$config = array(*
* "endpoint" => array("localhost" => array("host"=>"127.0.0.1",*
* "port"=>"8983", "path"=>"/solr", "core"=>"demo",)*
*) );*
*   $start = microtime(true);*

*if($_POST){*
*// create a client instance*
*$client = new Solarium\Client($config);*
*$dossier=$this->request->data['User']['dossier'];*
*$dir = new Folder($dossier);*
*$files = $dir->find('.*\.*');*

* $headers = array('Content-Type:multipart/form-data');*

*foreach ($files as $file) {*
*$file = new File($dir->pwd() . DS . $file);*

*$query = $client->createExtract();*
*$query->setFile($file->pwd());*
*$query->setCommit(true);*
*$query->setOmitHeader(false);*

*$doc = $query->createDocument();*
*$doc->id =$file->pwd();*
*$doc->name = $file->name;*
*$doc->title = $file->name();*

*$query->setDocument($doc);*

*$request = $client->createRequest($query);*
*$request->addHeaders($headers);*

*$result = $client->executeRequest($request);*
*}*

*}*

*$this->set(compact('start'));*
*}*


Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
yes I am sure because i successeflly Post the same document(455 .doc .docx
and pdf in 18 second) with SimplePostTool
But now i want to commincate directly with my server solr using solarium in
my application cakephp ; I think only way to have the right encoding is in
header :
*$headers = array('Content-Type:multipart/form-data');*
* I guess it will *working if the time of indexing is not depassing 30
second from time out of wamp server.

2015-12-02 16:32 GMT+00:00 Gora Mohanty <g...@mimirtech.com>:

> On 2 December 2015 at 21:55, kostali hassan <med.has.kost...@gmail.com>
> wrote:
> > yes they are a Error in my solr logs:
> > SolrException URLDecoder: Invalid character encoding detected after
> > position 79 of query string / form data (while parsing as UTF-8)
> > <
> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
> >
> > this is my post in stack overflow :
> >
> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
>
> Looks like an encoding error all right. Are you very sure that you can
> sucessfully POST the same document with SimplePostTool. If so, I would
> guess that you are not using Solarium correctly, i.e., the PDF file is
> getting POSTed such that Solr is getting the raw content rather than
> the extracted content.
>
> Regards,
> Gora
>


indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
HOW I can indexing from solarium rich data(msword and pdf files) from a
dirctory who contient many files, MY config is

$config = array(
 "endpoint" => array("localhost" => array("host"=>"127.0.0.1",
 "port"=>"8983", "path"=>"/solr", "core"=>"demo",)
) );

I try this code:

$dir = new Folder($dossier);
$files = $dir->find('.*\.*');
foreach ($files as $file) {
$file = new File($dir->pwd() . DS . $file);

$update = $client->createUpdate();

$query = $client->createExtract();
$query->setFile($file->pwd());
$query->setCommit(true);
$query->setOmitHeader(false);
$doc = $query->createDocument();
$doc->id =$file->pwd();
$doc->name = $file->name;
$doc->title = $file->name();
$query->setDocument($doc);

$result = $client->extract($query);
}

When i execute it i get this ERROR:

org.apache.solr.common.SolrException: URLDecoder: Invalid character
encoding detected after position 79 of query string / form data (while
parsing as UTF-8)


Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
yes its logic Thank you , but i want understand why the same data is
indexing fine in shell using windows SimplePostTool :
>
> D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes
> -Dc=solr_docs_core -Ddata=files -Drecursive=yes
> org.apache.solr.util.SimplePostTool D:\Lucene\document ;



2015-12-02 11:09 GMT+00:00 Gora Mohanty <g...@mimirtech.com>:

> On 2 December 2015 at 16:32, kostali hassan <med.has.kost...@gmail.com>
> wrote:
> [...]
> >
> > When i execute it i get this ERROR:
> >
> > org.apache.solr.common.SolrException: URLDecoder: Invalid character
> > encoding detected after position 79 of query string / form data (while
> > parsing as UTF-8)
>
> Solr expects UTF-8 data. Your documents are probably in some different
> encoding. You will need to figure out what the encoding is, and how to
> convert it to UTF-8.
>
> Regards,
> Gora
>


Fwd: Indexing rich data (msword and pdf) in apache solr-5.3.1

2015-12-01 Thread kostali hassan
I start working in solr 5x by extract solr in D://solr and run solr server
with :

D:\solr\solr-5.3.1\bin>solr start ;

Then I create a core in standalone mode :

D:\solr\solr-5.3.1\bin>solr create -c mycore

I need indexing from system files (word and pdf) and the schema API don’t
have a field “name” of document, then I Add this field using curl :

curl -X POST -H 'Content-type:application/json' --data-binary '{

  "add-field":{

 "name":"name",

 "type":"text_general",

 "stored":true,

 “indexed”:true }

}' http://localhost:8983/solr/mycore/schema



And re-index all document.with windows SimplepostTools:

D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes
-Dc=mycore -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool
D:\Lucene\document ;



But even if the field “name” is succeffly added he is empty ; the field
title get the name for only pdf document not for msword(.doc and .docx).



Then I choose indexing with techproducts example because he don’t use
schema.xml API then I can modified my schema:



D:\solr\solr-5.3.1>solr –e techproducts



Techproducts return the name of all files.xml indexed;



Then I create a new core based in solr_home example/techproducts/solr and I
use schema.xml (contient field “name”) and solrConfig.xml from techproducts
in this new core called demo.

When I indexed all document the field name exist but still empty for all
document indexed.



My question is how I can get just the name of each document(msword and pdf)
not the path like the field “id” or field “ressource_name” ; I have to
create new Typefield or exist another way.



Sorry for my basic English.

Thank you.


Fwd: index rich data with solarium php solr Client

2015-12-01 Thread kostali hassan
I get this Error
Invalid character encoding detected after position 79 of query string /
form data (while parsing as UTF-8)

this is my function to index rich data from directory contains many files
(msword and pdf) :

$config = array(
 "endpoint" => array("localhost" => array("host"=>"127.0.0.1",
 "port"=>"8983", "path"=>"/solr", "core"=>"demo",)
) );
$status='00';
$time='00';


if($_POST){
// create a client instance
$client = new Solarium\Client($config);
$dossier=$this->request->data['User']['dossier'];
$dir = new Folder($dossier);
$files = $dir->find('.*\.*');

foreach ($files as $file) {
$file = new File($dir->pwd() . DS . $file);

$query = $client->createExtract();

$query->setFile($file->pwd());
$query->setCommit(true);
$query->setOmitHeader(false);

$doc = $query->createDocument();
$doc->id =$file->pwd();
$doc->name = $file->name;
$doc->title = $file->name();

$query->setDocument($doc);

$result = $client->extract($query);
}

$status=$result->getStatus();
$time=$result->getQueryTime();
}

$this->set(compact('time','status'));
}