Hi, using elasticsearch-1.3.2 with
Plug-in ----- name: mapper-attachments version: 2.3.1 description: Adds the attachment type allowing to parse difference attachment formats jvm: true site: false on Windows 8 for evaluation purpose. JVM ----- version: 1.7.0_67 vm_name: Java HotSpot(TM) Client VM vm_version: 24.65-b04 vm_vendor: Oracle Corporation I have created the following mapping: { myIndex: { mappings: { dokument: { properties: { created: { type: date format: dateOptionalTime } description: { type: string } file: { type: attachment path: full fields: { file: { type: string store: true term_vector: with_positions_offsets } author: { type: string } title: { type: string } name: { type: string } date: { type: date format: dateOptionalTime } keywords: { type: string } content_type: { type: string } content_length: { type: integer } language: { type: string } } } id: { type: string } title: { type: string } } } } } } Because I like to use ES from C#/.NET I have created a little C# app that reads a file as base64 encodes stream from hard drive and put the document to the index of ES. I'm working with this POST request: { "id": "8dbf1d73-44d1-4e20-aa35-13b18ddf5057", "title": "Test", "description": "Test Description", "created": "2014-01-20T19:04:20.1019885+01:00", "file": { "_content_type": "application/pdf", "_name": "Test.pdf", "content": "---my base64 stuff here---" } } and send it as index command to ES like this: myIndex/dokument/8dbf1d73-44d1-4e20-aa35-13b18ddf5057?refresh=true After that I query ES with this request: { "fields": [], "query": { "match": { "file": "test" } }, "highlight": { "fields": { "file": {} } } } If my input is a *.pdf or *.txt file everything works as expected. The content of the document was recognized by the mapper-attachments plug-in and the results with my string "test" that I'm looking for are highlighted. I have searched for hours now to find a solution to do the same with Microsoft Office documents but I'm not able to get it to work. ES does not send any error message during adding the documents but I'm not able to find the content of my office documents. Can anyone please help me an give me an sample how to index a *.doc, *.docx, *.xls, *.xlsx etc.? I have tried to give ES a hint for the content-type / mime type based on this link http://filext.com/faq/office_mime_types.php but this makes no change. Thanks in advance! Dirk -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Not-able-to-fulltext-index-Microsoft-Office-documents-PDF-works-fine-tp4062325.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1408811281465-4062325.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.