from:"Cihad Guzel"

Re: Oracle JDBC Job Error

2020-01-06 Thread Cihad Guzel

Hi,

I have debugged the MFC-2.15 codes and caught the problem.

JDBCConnector.java line:270

Object o = row.getValue(JDBCConstants.idReturnColumnName);

if (o == null)
  throw new ManifoldCFException("Bad seed query; doesn't return
$(IDCOLUMN) column.  Try using quotes around $(IDCOLUMN) variable,
e.g. \"$(IDCOLUMN)\", or, for MySQL, select \"by label\" in your
repository connection.");


The "row" object's value is "LCF__ID" -> this is a uppercase string

"JDBCConstants.idReturnColumnName" is "lcf__id" -> this is a lowercase string

So "o" object is null.

I think that Oracle returns the uppercase column name. It is not a
bug. How can I fix it? Should I update the seed query in the Query
tab? Should we change the code lines?

Regards,
Cihad Guzel


Cihad Guzel , 5 Oca 2020 Paz, 20:14 tarihinde şunu yazdı:

> Hi,
>
> I try JDBC connector with Oracle (version: 11.2.0.4). I added to classpath
> ojdbc6.jar. My seed query as follows:
>
> "SELECT PERSONID AS $(IDCOLUMN) FROM PERSON"
>
> and I have an error as follow:
>
> "Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using
> quotes around $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)", or, for MySQL,
> select "by label" in your repository connection."
>
> I have tried JDBC connector with MsSQL and Mysql. It has run successfully.
>
> How can I fix it?
>
> Regards,
> Cihad Guzel
>

Oracle JDBC Job Error

2020-01-05 Thread Cihad Guzel

Hi,

I try JDBC connector with Oracle (version: 11.2.0.4). I added to classpath
ojdbc6.jar. My seed query as follows:

"SELECT PERSONID AS $(IDCOLUMN) FROM PERSON"

and I have an error as follow:

"Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes
around $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)", or, for MySQL, select "by
label" in your repository connection."

I have tried JDBC connector with MsSQL and Mysql. It has run successfully.

How can I fix it?

Regards,
Cihad Guzel

Re: Solr Repository Connector

2019-08-05 Thread Cihad Guzel

Hi Dileepa,

You can check all MFC Connectors list from
https://manifoldcf.apache.org/release/release-2.13/en_US/included-connectors.html

MFC have a Solr Output Connector. It is not a repository connector. if you
want to use as repository connector, you should write a new repository
connector.

Regards,
Cihad Guzel


Dileepa Jayakody , 5 Ağu 2019 Pzt, 13:18
tarihinde şunu yazdı:

> Hi All,
>
> I'm working on a project which needs to implement a federated search
> solution with heterogeneous data repositories. One repository is a Solr
> index. I would like to use ManifoldCF as the data ingestion engine in this
> project as I have worked with MCF before.
>
> Does ManifoldCF has a Solr repository connector which I can use here? Or
> will I need to implement a new repository connector for Solr?
> Any guidance here is much appreciated.
>
> Thanks,
> Dileepa
>

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel

In addition my all job are waiting as "End notification" status now.

Cihad Güzel


Cihad Guzel , 13 Tem 2019 Cmt, 22:27 tarihinde şunu
yazdı:

> Hi Karl,
>
> No, this is different new setup. But I use same version for mfc and
> database. I am tring new setup for my every testing. I didn't see any
> repeated or non-repeated error logs like before.
>
> Then, I have build the jdbc connector from trunk branch and changed the
> jdbc-connector.jar with new version. Now, There are non-repeated new error
> log as follow:
>
> ERROR 2019-07-13T16:20:34,259 (Seeding thread) - Exception tossed:
> Unexpected job status: 33
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job
> status: 33
> at
> org.apache.manifoldcf.crawler.jobs.JobManager.resetSeedJob(JobManager.java:7934)
> ~[mcf-pull-agent.jar:?]
> at
> org.apache.manifoldcf.crawler.system.SeedingThread.run(SeedingThread.java:242)
> [mcf-pull-agent.jar:?]
>
> Cihad Guzel
>
>
> Karl Wright , 13 Tem 2019 Cmt, 18:46 tarihinde şunu
> yazdı:
>
>> You previously reported errors of the kind that ManifoldCF throws when it
>> finds that the database seemingly lost transactional integrity.
>> My question is whether you are still using the same database setup where
>> you previously got those errors?
>>
>> The ArrayIndexOutOfBounds exception applied to JDBC connector metadata
>> indexing.  If you are using the JDBC connector with metadata without the
>> patch then it could explain your problem also.  But you would be seeing
>> exceptions thrown over and over again in your logs.
>>
>> Karl
>>
>>
>> On Sat, Jul 13, 2019 at 11:12 AM Cihad Guzel  wrote:
>>
>>> Hi Karl,
>>>
>>> If you are talking about is
>>> https://issues.apache.org/jira/browse/CONNECTORS-1613, my setup doesn't
>>> include this change because I use mfc 2.12. Are you suggesting I use a
>>> trunk version?
>>>
>>> Cihad Güzel
>>>
>>> Karl Wright , 13 Tem 2019 Cmt, 17:27 tarihinde şunu
>>> yazdı:
>>>
>>>> Is this the same setup where you were getting errors because of
>>>> inconsistent database states?
>>>> That could lead to this problem, you know.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Sat, Jul 13, 2019 at 10:14 AM Cihad Guzel  wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> I also have a job waiting for 12 days as an "Aborting" status.
>>>>>
>>>>> Status: Aborting
>>>>> Start time: 7/1/19 4:01:46 PM
>>>>> Documents: 10003
>>>>> Active: 10003
>>>>> Processed: 10002
>>>>>
>>>>> Cihad Guzel
>>>>>
>>>>>
>>>>> Cihad Guzel , 13 Tem 2019 Cmt, 16:53 tarihinde
>>>>> şunu yazdı:
>>>>>
>>>>>> Hi Karl,
>>>>>>
>>>>>> I tried quick-start single process model. After your suggestion , i
>>>>>> have tried multiprocess-zk-example for zookeeper-based locking. But I 
>>>>>> have
>>>>>> the same problem.
>>>>>>
>>>>>> Status: Aborting
>>>>>> Start time: 7/13/19 3:42:10 PM
>>>>>> Documents: 10003
>>>>>> Active: 10003
>>>>>> Processed: 1021
>>>>>>
>>>>>> I'm waiting for over an hour for the jdbc job to stop. I have not
>>>>>> any error logs in my manifolcf log.
>>>>>>
>>>>>> Cihad Güzel
>>>>>>
>>>>>> Cihad Güzel
>>>>>>
>>>>>>
>>>>>> Karl Wright , 8 Tem 2019 Pzt, 13:23 tarihinde
>>>>>> şunu yazdı:
>>>>>>
>>>>>>> Are you using file-based locking?
>>>>>>> If so, I would suggest strongly migrating to zookeeper-based locking.
>>>>>>> But if you are using the file-based locking, please execute the
>>>>>>> "lock clean procedure" as follows:
>>>>>>>
>>>>>>> - shut down all manifoldcf processes, including the web UI
>>>>>>> - run the lock-clean script
>>>>>>> - start the processes again
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 8, 2019 at 5:05 AM C

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel

Hi Karl,

No, this is different new setup. But I use same version for mfc and
database. I am tring new setup for my every testing. I didn't see any
repeated or non-repeated error logs like before.

Then, I have build the jdbc connector from trunk branch and changed the
jdbc-connector.jar with new version. Now, There are non-repeated new error
log as follow:

ERROR 2019-07-13T16:20:34,259 (Seeding thread) - Exception tossed:
Unexpected job status: 33
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job
status: 33
at
org.apache.manifoldcf.crawler.jobs.JobManager.resetSeedJob(JobManager.java:7934)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.SeedingThread.run(SeedingThread.java:242)
[mcf-pull-agent.jar:?]

Cihad Guzel


Karl Wright , 13 Tem 2019 Cmt, 18:46 tarihinde şunu
yazdı:

> You previously reported errors of the kind that ManifoldCF throws when it
> finds that the database seemingly lost transactional integrity.
> My question is whether you are still using the same database setup where
> you previously got those errors?
>
> The ArrayIndexOutOfBounds exception applied to JDBC connector metadata
> indexing.  If you are using the JDBC connector with metadata without the
> patch then it could explain your problem also.  But you would be seeing
> exceptions thrown over and over again in your logs.
>
> Karl
>
>
> On Sat, Jul 13, 2019 at 11:12 AM Cihad Guzel  wrote:
>
>> Hi Karl,
>>
>> If you are talking about is
>> https://issues.apache.org/jira/browse/CONNECTORS-1613, my setup doesn't
>> include this change because I use mfc 2.12. Are you suggesting I use a
>> trunk version?
>>
>> Cihad Güzel
>>
>> Karl Wright , 13 Tem 2019 Cmt, 17:27 tarihinde şunu
>> yazdı:
>>
>>> Is this the same setup where you were getting errors because of
>>> inconsistent database states?
>>> That could lead to this problem, you know.
>>>
>>> Karl
>>>
>>>
>>> On Sat, Jul 13, 2019 at 10:14 AM Cihad Guzel  wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> I also have a job waiting for 12 days as an "Aborting" status.
>>>>
>>>> Status: Aborting
>>>> Start time: 7/1/19 4:01:46 PM
>>>> Documents: 10003
>>>> Active: 10003
>>>> Processed: 10002
>>>>
>>>> Cihad Guzel
>>>>
>>>>
>>>> Cihad Guzel , 13 Tem 2019 Cmt, 16:53 tarihinde şunu
>>>> yazdı:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> I tried quick-start single process model. After your suggestion , i
>>>>> have tried multiprocess-zk-example for zookeeper-based locking. But I have
>>>>> the same problem.
>>>>>
>>>>> Status: Aborting
>>>>> Start time: 7/13/19 3:42:10 PM
>>>>> Documents: 10003
>>>>> Active: 10003
>>>>> Processed: 1021
>>>>>
>>>>> I'm waiting for over an hour for the jdbc job to stop. I have not  any
>>>>> error logs in my manifolcf log.
>>>>>
>>>>> Cihad Güzel
>>>>>
>>>>> Cihad Güzel
>>>>>
>>>>>
>>>>> Karl Wright , 8 Tem 2019 Pzt, 13:23 tarihinde
>>>>> şunu yazdı:
>>>>>
>>>>>> Are you using file-based locking?
>>>>>> If so, I would suggest strongly migrating to zookeeper-based locking.
>>>>>> But if you are using the file-based locking, please execute the "lock
>>>>>> clean procedure" as follows:
>>>>>>
>>>>>> - shut down all manifoldcf processes, including the web UI
>>>>>> - run the lock-clean script
>>>>>> - start the processes again
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 8, 2019 at 5:05 AM Cihad Guzel  wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>>
>>>>>>> Nothing. I don't have any error log.
>>>>>>>
>>>>>>> 8 Tem 2019 Pzt 03:18 tarihinde Karl Wright 
>>>>>>> şunu yazdı:
>>>>>>>
>>>>>>>> Hi Cihad,
>>>>>>>>
>>>>>>>> What does your manifoldcf log have in it?  Any errors?
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Jul 7, 2019 at 3:52 PM Cihad Guzel 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Karl,
>>>>>>>>>
>>>>>>>>> I mistakenly wrote "Stopping" instead of "Aborting". My job is
>>>>>>>>> waiting as "Aborting" status. I have also the same problem while
>>>>>>>>> restarting. I am waiting for 2 days for one job.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Cihad Guzel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cihad Guzel , 7 Tem 2019 Paz, 22:42 tarihinde
>>>>>>>>> şunu yazdı:
>>>>>>>>>
>>>>>>>>>> Hi Karl,
>>>>>>>>>>
>>>>>>>>>> I have a few jobs. I stopped all of them but only one job is
>>>>>>>>>> waiting as "stopping" status.
>>>>>>>>>>
>>>>>>>>>> I know that some large jobs is waited long time. But, I have only
>>>>>>>>>> 1000 rows on database. So, all of jobs crawled small number of 
>>>>>>>>>> documents.
>>>>>>>>>> But , It doesn't make much sense to stay status of "stopping" for a 
>>>>>>>>>> long
>>>>>>>>>> time. How can I identify a problem?
>>>>>>>>>>
>>>>>>>>>> Postgresql version: 9.4
>>>>>>>>>> Manifoldcf version: 2.12
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Cihad Guzel
>>>>>>>>>>
>>>>>>>>>

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel

Hi Karl,

If you are talking about is
https://issues.apache.org/jira/browse/CONNECTORS-1613, my setup doesn't
include this change because I use mfc 2.12. Are you suggesting I use a
trunk version?

Cihad Güzel

Karl Wright , 13 Tem 2019 Cmt, 17:27 tarihinde şunu
yazdı:

> Is this the same setup where you were getting errors because of
> inconsistent database states?
> That could lead to this problem, you know.
>
> Karl
>
>
> On Sat, Jul 13, 2019 at 10:14 AM Cihad Guzel  wrote:
>
>> Hi Karl,
>>
>> I also have a job waiting for 12 days as an "Aborting" status.
>>
>> Status: Aborting
>> Start time: 7/1/19 4:01:46 PM
>> Documents: 10003
>> Active: 10003
>> Processed: 10002
>>
>> Cihad Guzel
>>
>>
>> Cihad Guzel , 13 Tem 2019 Cmt, 16:53 tarihinde şunu
>> yazdı:
>>
>>> Hi Karl,
>>>
>>> I tried quick-start single process model. After your suggestion , i have
>>> tried multiprocess-zk-example for zookeeper-based locking. But I have the
>>> same problem.
>>>
>>> Status: Aborting
>>> Start time: 7/13/19 3:42:10 PM
>>> Documents: 10003
>>> Active: 10003
>>> Processed: 1021
>>>
>>> I'm waiting for over an hour for the jdbc job to stop. I have not  any
>>> error logs in my manifolcf log.
>>>
>>> Cihad Güzel
>>>
>>> Cihad Güzel
>>>
>>>
>>> Karl Wright , 8 Tem 2019 Pzt, 13:23 tarihinde şunu
>>> yazdı:
>>>
>>>> Are you using file-based locking?
>>>> If so, I would suggest strongly migrating to zookeeper-based locking.
>>>> But if you are using the file-based locking, please execute the "lock
>>>> clean procedure" as follows:
>>>>
>>>> - shut down all manifoldcf processes, including the web UI
>>>> - run the lock-clean script
>>>> - start the processes again
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Mon, Jul 8, 2019 at 5:05 AM Cihad Guzel  wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> Nothing. I don't have any error log.
>>>>>
>>>>> 8 Tem 2019 Pzt 03:18 tarihinde Karl Wright  şunu
>>>>> yazdı:
>>>>>
>>>>>> Hi Cihad,
>>>>>>
>>>>>> What does your manifoldcf log have in it?  Any errors?
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Sun, Jul 7, 2019 at 3:52 PM Cihad Guzel  wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>>
>>>>>>> I mistakenly wrote "Stopping" instead of "Aborting". My job is
>>>>>>> waiting as "Aborting" status. I have also the same problem while
>>>>>>> restarting. I am waiting for 2 days for one job.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Cihad Guzel
>>>>>>>
>>>>>>>
>>>>>>> Cihad Guzel , 7 Tem 2019 Paz, 22:42 tarihinde
>>>>>>> şunu yazdı:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>
>>>>>>>> I have a few jobs. I stopped all of them but only one job is
>>>>>>>> waiting as "stopping" status.
>>>>>>>>
>>>>>>>> I know that some large jobs is waited long time. But, I have only
>>>>>>>> 1000 rows on database. So, all of jobs crawled small number of 
>>>>>>>> documents.
>>>>>>>> But , It doesn't make much sense to stay status of "stopping" for a 
>>>>>>>> long
>>>>>>>> time. How can I identify a problem?
>>>>>>>>
>>>>>>>> Postgresql version: 9.4
>>>>>>>> Manifoldcf version: 2.12
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Cihad Guzel
>>>>>>>>
>>>>>>>

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel

Hi Karl,

I also have a job waiting for 12 days as an "Aborting" status.

Status: Aborting
Start time: 7/1/19 4:01:46 PM
Documents: 10003
Active: 10003
Processed: 10002

Cihad Guzel


Cihad Guzel , 13 Tem 2019 Cmt, 16:53 tarihinde şunu
yazdı:

> Hi Karl,
>
> I tried quick-start single process model. After your suggestion , i have
> tried multiprocess-zk-example for zookeeper-based locking. But I have the
> same problem.
>
> Status: Aborting
> Start time: 7/13/19 3:42:10 PM
> Documents: 10003
> Active: 10003
> Processed: 1021
>
> I'm waiting for over an hour for the jdbc job to stop. I have not  any
> error logs in my manifolcf log.
>
> Cihad Güzel
>
> Cihad Güzel
>
>
> Karl Wright , 8 Tem 2019 Pzt, 13:23 tarihinde şunu
> yazdı:
>
>> Are you using file-based locking?
>> If so, I would suggest strongly migrating to zookeeper-based locking.
>> But if you are using the file-based locking, please execute the "lock
>> clean procedure" as follows:
>>
>> - shut down all manifoldcf processes, including the web UI
>> - run the lock-clean script
>> - start the processes again
>>
>> Thanks,
>> Karl
>>
>>
>> On Mon, Jul 8, 2019 at 5:05 AM Cihad Guzel  wrote:
>>
>>> Hi Karl,
>>>
>>> Nothing. I don't have any error log.
>>>
>>> 8 Tem 2019 Pzt 03:18 tarihinde Karl Wright  şunu
>>> yazdı:
>>>
>>>> Hi Cihad,
>>>>
>>>> What does your manifoldcf log have in it?  Any errors?
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Sun, Jul 7, 2019 at 3:52 PM Cihad Guzel  wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> I mistakenly wrote "Stopping" instead of "Aborting". My job is waiting
>>>>> as "Aborting" status. I have also the same problem while restarting. I am
>>>>> waiting for 2 days for one job.
>>>>>
>>>>> Regards,
>>>>> Cihad Guzel
>>>>>
>>>>>
>>>>> Cihad Guzel , 7 Tem 2019 Paz, 22:42 tarihinde şunu
>>>>> yazdı:
>>>>>
>>>>>> Hi Karl,
>>>>>>
>>>>>> I have a few jobs. I stopped all of them but only one job is waiting
>>>>>> as "stopping" status.
>>>>>>
>>>>>> I know that some large jobs is waited long time. But, I have only
>>>>>> 1000 rows on database. So, all of jobs crawled small number of documents.
>>>>>> But , It doesn't make much sense to stay status of "stopping" for a long
>>>>>> time. How can I identify a problem?
>>>>>>
>>>>>> Postgresql version: 9.4
>>>>>> Manifoldcf version: 2.12
>>>>>>
>>>>>> Regards,
>>>>>> Cihad Guzel
>>>>>>
>>>>>

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel

Hi Karl,

I tried quick-start single process model. After your suggestion , i have
tried multiprocess-zk-example for zookeeper-based locking. But I have the
same problem.

Status: Aborting
Start time: 7/13/19 3:42:10 PM
Documents: 10003
Active: 10003
Processed: 1021

I'm waiting for over an hour for the jdbc job to stop. I have not  any
error logs in my manifolcf log.

Cihad Güzel

Cihad Güzel


Karl Wright , 8 Tem 2019 Pzt, 13:23 tarihinde şunu
yazdı:

> Are you using file-based locking?
> If so, I would suggest strongly migrating to zookeeper-based locking.
> But if you are using the file-based locking, please execute the "lock
> clean procedure" as follows:
>
> - shut down all manifoldcf processes, including the web UI
> - run the lock-clean script
> - start the processes again
>
> Thanks,
> Karl
>
>
> On Mon, Jul 8, 2019 at 5:05 AM Cihad Guzel  wrote:
>
>> Hi Karl,
>>
>> Nothing. I don't have any error log.
>>
>> 8 Tem 2019 Pzt 03:18 tarihinde Karl Wright  şunu
>> yazdı:
>>
>>> Hi Cihad,
>>>
>>> What does your manifoldcf log have in it?  Any errors?
>>>
>>> Karl
>>>
>>>
>>> On Sun, Jul 7, 2019 at 3:52 PM Cihad Guzel  wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> I mistakenly wrote "Stopping" instead of "Aborting". My job is waiting
>>>> as "Aborting" status. I have also the same problem while restarting. I am
>>>> waiting for 2 days for one job.
>>>>
>>>> Regards,
>>>> Cihad Guzel
>>>>
>>>>
>>>> Cihad Guzel , 7 Tem 2019 Paz, 22:42 tarihinde şunu
>>>> yazdı:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> I have a few jobs. I stopped all of them but only one job is waiting
>>>>> as "stopping" status.
>>>>>
>>>>> I know that some large jobs is waited long time. But, I have only 1000
>>>>> rows on database. So, all of jobs crawled small number of documents. But ,
>>>>> It doesn't make much sense to stay status of "stopping" for a long time.
>>>>> How can I identify a problem?
>>>>>
>>>>> Postgresql version: 9.4
>>>>> Manifoldcf version: 2.12
>>>>>
>>>>> Regards,
>>>>> Cihad Guzel
>>>>>
>>>>

Re: Some jobs is waiting as "stopping" status

2019-07-08 Thread Cihad Guzel

Hi Karl,

Nothing. I don't have any error log.

8 Tem 2019 Pzt 03:18 tarihinde Karl Wright  şunu yazdı:

> Hi Cihad,
>
> What does your manifoldcf log have in it?  Any errors?
>
> Karl
>
>
> On Sun, Jul 7, 2019 at 3:52 PM Cihad Guzel  wrote:
>
>> Hi Karl,
>>
>> I mistakenly wrote "Stopping" instead of "Aborting". My job is waiting as
>> "Aborting" status. I have also the same problem while restarting. I am
>> waiting for 2 days for one job.
>>
>> Regards,
>> Cihad Guzel
>>
>>
>> Cihad Guzel , 7 Tem 2019 Paz, 22:42 tarihinde şunu
>> yazdı:
>>
>>> Hi Karl,
>>>
>>> I have a few jobs. I stopped all of them but only one job is waiting as
>>> "stopping" status.
>>>
>>> I know that some large jobs is waited long time. But, I have only 1000
>>> rows on database. So, all of jobs crawled small number of documents. But ,
>>> It doesn't make much sense to stay status of "stopping" for a long time.
>>> How can I identify a problem?
>>>
>>> Postgresql version: 9.4
>>> Manifoldcf version: 2.12
>>>
>>> Regards,
>>> Cihad Guzel
>>>
>>

Re: Some jobs is waiting as "stopping" status

2019-07-07 Thread Cihad Guzel

Hi Karl,

I mistakenly wrote "Stopping" instead of "Aborting". My job is waiting as
"Aborting" status. I have also the same problem while restarting. I am
waiting for 2 days for one job.

Regards,
Cihad Guzel


Cihad Guzel , 7 Tem 2019 Paz, 22:42 tarihinde şunu yazdı:

> Hi Karl,
>
> I have a few jobs. I stopped all of them but only one job is waiting as
> "stopping" status.
>
> I know that some large jobs is waited long time. But, I have only 1000
> rows on database. So, all of jobs crawled small number of documents. But ,
> It doesn't make much sense to stay status of "stopping" for a long time.
> How can I identify a problem?
>
> Postgresql version: 9.4
> Manifoldcf version: 2.12
>
> Regards,
> Cihad Guzel
>

Some jobs is waiting as "stopping" status

2019-07-07 Thread Cihad Guzel

Hi Karl,

I have a few jobs. I stopped all of them but only one job is waiting as
"stopping" status.

I know that some large jobs is waited long time. But, I have only 1000 rows
on database. So, all of jobs crawled small number of documents. But , It
doesn't make much sense to stay status of "stopping" for a long time. How
can I identify a problem?

Postgresql version: 9.4
Manifoldcf version: 2.12

Regards,
Cihad Guzel

JDBC Connector Max Connection Size is set as hardcoded

2019-06-28 Thread Cihad Guzel

Hi Karl,

I create a new jdbc repository connection and I set "max connections "and
"Max avg fetches/min" from throttling tab on mfc-ui. Then, I reviewed
JDBCConnectionFactory.java and I have encountered some hardcoded parameters
as follow:

cp =_pool.addAlias(poolKey, driverClassName, dburl, userName,
password, 30, 30L)

Max connection is set as 30.

Are we sure that the parameters entered from the screen are used?

Regards,
Cihad Guzel

Re: ssh connector

2019-06-24 Thread Cihad Guzel

Hi Saunier,

ManifoldCF does not exist SFTP connectors yet. You can write a new
connector for this purpose. Please follow this guide:
https://manifoldcf.apache.org/release/release-2.12/en_US/writing-repository-connectors.html

Regards,
Cihad Guzel


SAUNIER Maxence , 24 Haz 2019 Pzt, 14:57 tarihinde şunu
yazdı:

> Hello Cihad,
>
>
>
> And for SFTP, do you know what I could use as a connector ?
>
>
>
> Thanks
>
>
>
> *De :* Cihad Guzel 
> *Envoyé :* dimanche 23 juin 2019 20:51
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: ssh connector
>
>
>
> Hi Saunier,
>
> SSH is a network protocol for securely communicating between computers. It
> is not a repository. You can use jcifs connector for remote shared files.
> Jcifs is support samba connection for remote shared files. Please follow :
> http://manifoldcf.apache.org/release/release-2.12/en_US/end-user-documentation.html#jcifsrepository
>
>
>
> Cihad Güzel
>
>
>
>
>
> SAUNIER Maxence , 17 Haz 2019 Pzt, 15:23 tarihinde şunu
> yazdı:
>
> I want to use an SSH or SFTP connection type for my repository connection.
> But, I do not know which connector to use. Do you have any idea ?
>
>
>
> Thanks
>
>
>
>
>
> *De :* Cihad Guzel 
> *Envoyé :* lundi 17 juin 2019 12:24
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: ssh connector
>
>
>
> Hi Saunier,
>
>
>
> What do you want do exactly via ssh?
>
>
>
> Cihad Güzel
>
>
>
>
>
> SAUNIER Maxence , 17 Haz 2019 Pzt, 12:37 tarihinde şunu
> yazdı:
>
> Hello Karl,
>
>
>
> Do ou have any news for this question ?
>
>
>
> Thanks you,
>
>
>
> *De :* SAUNIER Maxence
> *Envoyé :* mercredi 12 juin 2019 12:16
> *À :* user@manifoldcf.apache.org
> *Objet :* ssh connector
>
>
>
> Do you have any SSH connector for the repository ?
>
>

Re: Unexpected job status encountered

2019-06-24 Thread Cihad Guzel

Hi Karl,

I have 2 types error as follow:

FATAL 2019-06-24T09:37:27,226 (Worker thread '3') - Error tossed: 7
java.lang.ArrayIndexOutOfBoundsException: 7
at
org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.applyMultiAttributeValues(JDBCConnector.java:2188)
~[?:?]
at
org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:785)
~[?:?]
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
[mcf-pull-agent.jar:?]
ERROR 2019-06-24T09:37:46,763 (Seeding thread) - Exception tossed:
Unexpected job status: 0
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job
status: 0
at
org.apache.manifoldcf.crawler.jobs.JobManager.resetSeedJob(JobManager.java:7934)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.SeedingThread.run(SeedingThread.java:242)
[mcf-pull-agent.jar:?]

Regards,
Cihad Guzel


Cihad Guzel , 24 Haz 2019 Pzt, 12:39 tarihinde şunu
yazdı:

> Hi Karl,
>
> I have new log as follow:
>
> FATAL 2019-06-24T09:36:49,591 (Worker thread '1') - Error tossed: 7
> java.lang.ArrayIndexOutOfBoundsException: 7
> at
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.applyMultiAttributeValues(JDBCConnector.java:2188)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:785)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
> FATAL 2019-06-24T09:36:49,591 (Worker thread '0') - Error tossed: 7
> java.lang.ArrayIndexOutOfBoundsException: 7
> at
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.applyMultiAttributeValues(JDBCConnector.java:2188)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:785)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
> FATAL 2019-06-24T09:36:50,206 (Worker thread '4') - Error tossed: 7
> java.lang.ArrayIndexOutOfBoundsException: 7
> at
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.applyMultiAttributeValues(JDBCConnector.java:2188)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:785)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
> FATAL 2019-06-24T09:36:50,398 (Worker thread '3') - Error tossed: 7
> java.lang.ArrayIndexOutOfBoundsException: 7
> at
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.applyMultiAttributeValues(JDBCConnector.java:2188)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:785)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
> FATAL 2019-06-24T09:36:50,776 (Worker thread '5') - Error tossed: 7
>
> Cihad Güzel
>
>
> Karl Wright , 23 Haz 2019 Paz, 22:14 tarihinde şunu
> yazdı:
>
>> Hi Cihad,
>>
>> The JVM suppresses traces after the first thousand or so for this kind of
>> exception.  So you need to go to the beginning of the log.
>>
>> Karl
>>
>>
>> On Sun, Jun 23, 2019 at 3:07 PM Cihad Guzel  wrote:
>>
>>> Hi Karl,
>>>
>>> No, I have not any different lines. These error logs continue to be
>>> written in logs file. I have only this lines:
>>>
>>> FATAL 2019-06-23T18:50:36,225 (Worker thread '16') - Error tossed: null
>>> java.lang.ArrayIndexOutOfBoundsException
>>> FATAL 2019-06-23T18:50:36,266 (Worker thread '12') - Error tossed: null
>>> java.lang.ArrayIndexOutOfBoundsException
>>> FATAL 2019-06-23T18:50:36,402 (Worker thread '13') - Error tossed: null
>>> java.lang.ArrayIndexOutOfBoundsException
>>> FATAL 2019-06-23T18:50:37,774 (Worker thread '22') - Error tossed: null
>>> java.lang.ArrayIndexOutOfBoundsException
>>> FATAL 2019-06-23T18:50:38,083 (Worker thread '27') - Error tossed: null
>>> java.lang.ArrayIndexOutOfBoundsException
>>> FATAL 2019-06-23T18:50:38,223 (Worker thread '23') - Error tossed: null
>>> java.lang.ArrayIndexOutOfBoundsException
>>> FATAL 2019-06-23T18:50:38,536 (Worker thread '15') - Error tossed: null
>>> java.lang.ArrayIndexOutOfBoundsException
>>>
>>>
>>> Cihad Güzel
>>>
>>>
>>> Karl Wright , 23 Haz 2019 Paz, 21:53 tarihinde şunu
>>> yazdı:
>>>
>>>> Hi Cihad,
>>>>
>>>> Do you have a stack trace of the ArrayIndexOutOfBounds exception?  It
>>>> would

Re: ssh connector

2019-06-23 Thread Cihad Guzel

Hi Saunier,

SSH is a network protocol for securely communicating between computers. It
is not a repository. You can use jcifs connector for remote shared files.
Jcifs is support samba connection for remote shared files. Please follow :
http://manifoldcf.apache.org/release/release-2.12/en_US/end-user-documentation.html#jcifsrepository

Cihad Güzel


SAUNIER Maxence , 17 Haz 2019 Pzt, 15:23 tarihinde şunu
yazdı:

> I want to use an SSH or SFTP connection type for my repository connection.
> But, I do not know which connector to use. Do you have any idea ?
>
>
>
> Thanks
>
>
>
>
>
> *De :* Cihad Guzel 
> *Envoyé :* lundi 17 juin 2019 12:24
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: ssh connector
>
>
>
> Hi Saunier,
>
>
>
> What do you want do exactly via ssh?
>
>
>
> Cihad Güzel
>
>
>
>
>
> SAUNIER Maxence , 17 Haz 2019 Pzt, 12:37 tarihinde şunu
> yazdı:
>
> Hello Karl,
>
>
>
> Do ou have any news for this question ?
>
>
>
> Thanks you,
>
>
>
> *De :* SAUNIER Maxence
> *Envoyé :* mercredi 12 juin 2019 12:16
> *À :* user@manifoldcf.apache.org
> *Objet :* ssh connector
>
>
>
> Do you have any SSH connector for the repository ?
>
>

Unexpected job status encountered

2019-06-23 Thread Cihad Guzel

Hi,

My crawler is received an error and is waiting with aborting status for a
long time. My manifoldcf.log has some repetitive error logs as follow:

java.lang.ArrayIndexOutOfBoundsException
FATAL 2019-06-19T21:58:12,180 (Worker thread '5') - Error tossed: null

So I restarted the manifoldcf agent. Then the waiting job was stopped and
status of it was is updated as "Error".

I opened the Status of Jobs table on manifoldcf interface, then I saw
following error message:
"Error: Unexpected job status encountered: 1"

The error message is print by Job.java but I don't understand why to
encountered with this error. Is there an error that is not handled?

Regards,
Cihad Güzel

Re: ssh connector

2019-06-17 Thread Cihad Guzel

Hi Saunier,

What do you want do exactly via ssh?

Cihad Güzel


SAUNIER Maxence , 17 Haz 2019 Pzt, 12:37 tarihinde şunu
yazdı:

> Hello Karl,
>
>
>
> Do ou have any news for this question ?
>
>
>
> Thanks you,
>
>
>
> *De :* SAUNIER Maxence
> *Envoyé :* mercredi 12 juin 2019 12:16
> *À :* user@manifoldcf.apache.org
> *Objet :* ssh connector
>
>
>
> Do you have any SSH connector for the repository ?
>

About the authorities to get ACLs

2019-05-13 Thread Cihad Guzel

Hi,

We need to get ACLs for file crawler with Shared Driver connector. So we
need to have an authorized user to read ACLs. I suggest that the crawler
user has "domain admin" group at Active Directory. But it is an authority
that people do not want to give easily.

So, do you have any guide or advice for a less authorized user?

Regards,
Cihad Guzel

schedule job

2019-04-24 Thread Cihad Guzel

Hi,

I try schedule job. I have created a job and set schedule values on
schedule tab. I have waited but my job hasn't run.

How do I run my job at specified time?

Regards,
Cihad Guzel

about schedule job

2019-04-16 Thread Cihad Guzel

Hi,

I try schedule job. We can select "day of week" and "day of month" at
scheduling tab. I think, we should only choose one of both. Is it true?

Is there a case we use both at the same time?

Regards,
Cihad Guzel

to handle multiple tables in a db crawler job

2019-04-07 Thread Cihad Guzel

Hi,

I try database crawler. I see that one job handle only one table.Is there
any way to handle multiple tables (or all tables) of a database in a job?

Regards,
Cihad Guzel

Difference between Maximum document length and Max file size

2019-02-27 Thread Cihad Guzel

Hi,

What is difference between "Maximum document length" on "Content Length"
tab and "Max file size (bytes)" on "Allowed Content" tab ?

I think that:

"Maximum document length" is the extracted data size ,
"Max file size" is the file size

Is it true?

Regards,
Cihad Guzel

Re: custom jcifs properties

2019-02-25 Thread Cihad Guzel

Hi Karl,

In some cases, "jcifs" is running slowly. In order to solve this problem,
we need to set custom some properties.

For example; my problem was in my test environment: I have a windows server
and an ubuntu server in same network in AWS EC2 Service. The windows server
has Active Directory service, DNS Server and shared folder while the ubuntu
server has some instance such as manifoldcf, an db instance and solr.

If the DNS settings are not defined on the ubuntu server, jcifs runs
slowly. Because the default resolver order is set as 'LMHOSTS,DNS,WINS'. It
means[1]; firstly "jcifs" checks '/etc/hosts' files for linux/unix
server'', then it checks the DNS server. In my opinion, the linux server
doesn't recognize the DNS server and threads are waiting for every file for
access to read.

I suppose, WINS is used when accessing hosts on different subnets. So, I
have set "jcifs.resolveOrder = WINS" and my problem has been FIXED.

I suppose, WINS is used when accessing hosts on different subnets.

Another suggestion for similar problem from another example[2]:
"-Djcifs.resolveOrder = DNS"

Finally; I suggest these changes:

Remove the line
(System.setProperty("jcifs.resolveOrder","LMHOSTS,DNS,WINS"); )  from
SharedDriveConnector.java

Add "-Djcifs.resolveOrder = LMHOSTS,DNS,WINS" to "start-options.env" file.

If you have been convinced about this, I can create a PR.

[1] https://www.jcifs.org/src/docs/resolver.html
[2] https://stackoverflow.com/a/18837754

Regards,
Cihad Guzel

Karl Wright , 24 Şub 2019 Paz, 19:20 tarihinde şunu
yazdı:

> These settings were provided by the developer of jcifs, Michael Allen.
> You have to really understand the protocol well before you should consider
> changing them in any way.
>
> Thanks,
> Karl
>
>
> On Sun, Feb 24, 2019 at 9:53 AM Cihad Guzel  wrote:
>
>> Hi,
>>
>> SharedDriveConnector have some hardcoded system properties as follow:
>>
>> static
>> {
>>   System.setProperty("jcifs.smb.client.soTimeout","15");
>>   System.setProperty("jcifs.smb.client.responseTimeout","12");
>>   System.setProperty("jcifs.resolveOrder","LMHOSTS,DNS,WINS");
>>   System.setProperty("jcifs.smb.client.listCount","20");
>>   System.setProperty("jcifs.smb.client.dfs.strictView","true");
>> }
>>
>> How can I override them when to start manifoldcf?
>>
>> It may be better to define these settings in the start-options.env file.
>>
>> Regards,
>> Cihad Guzel
>>
>

custom jcifs properties

2019-02-24 Thread Cihad Guzel

Hi,

SharedDriveConnector have some hardcoded system properties as follow:

static
{
  System.setProperty("jcifs.smb.client.soTimeout","15");
  System.setProperty("jcifs.smb.client.responseTimeout","12");
  System.setProperty("jcifs.resolveOrder","LMHOSTS,DNS,WINS");
  System.setProperty("jcifs.smb.client.listCount","20");
  System.setProperty("jcifs.smb.client.dfs.strictView","true");
}

How can I override them when to start manifoldcf?

It may be better to define these settings in the start-options.env file.

Regards,
Cihad Guzel

Re: Job hang in aborting state for along time

2019-02-11 Thread Cihad Guzel

Hi Karl,

Which PostgreSQL version do you recommend to use with MCF 2.12?
 9.3, 9.4,10+ or 11+?

Thanks,
Cihad Guzel


Karl Wright , 11 Şub 2019 Pzt, 04:01 tarihinde şunu
yazdı:

> No, it is not normal.  I expect that the MySQL transaction issues are
> causing lots of problems.
>
> Karl
>
>
> On Sun, Feb 10, 2019 at 7:13 PM Cihad Guzel  wrote:
>
>> Hi Karl,
>>
>> I use MySQL. I'll also try with PostgreSQL.
>>
>> All docs are processed one day ago. Is it normal for the aborting process
>> or finishing up threads to take so long?
>>
>> Thanks,
>> Cihad Guzel
>>
>>
>> Karl Wright , 11 Şub 2019 Pzt, 02:37 tarihinde şunu
>> yazdı:
>>
>>> What database is this?
>>> Basically, the "unexpected job status" means that the framework found
>>> something that should not have been possible, if the database had been
>>> properly enforcing ACID transactional constraints.  Is this MySQL?  Because
>>> if so it's known to have this problem.
>>>
>>> It also looks like MCF is trying to recover from some other problem
>>> (usually a database error).  I can tell this because that's what the
>>> particular thread in question does.  In order to recover, all worker
>>> threads must finish up with what they are doing and then everything can
>>> resync -- and that's not working because the database isn't in agreement
>>> that all the worker threads are shut down.
>>>
>>> Karl
>>>
>>>
>>> On Sun, Feb 10, 2019 at 6:23 PM Cihad Guzel  wrote:
>>>
>>>> Hi,
>>>>
>>>> I try external TIKA extractor. I have 4 continuously file crawler jobs.
>>>> Two of them have external tika extractor. One of them processed all
>>>> documents that is only 98 docs. The job is hanging in "Aborting" state when
>>>> manually abort. I waited more than 1 day and then the state changed.
>>>>
>>>> How can I find the problem?
>>>>
>>>> mysql> SELECT status, errortext, type, startmethod, id FROM jobs;
>>>> ++---+--+-+---+
>>>> | status | errortext | type | startmethod | id|
>>>> ++---+--+-+---+
>>>> | N  | NULL  | C| D   | 1549371059083 |
>>>> | X  | NULL  | C| D   | 1549371135463 |
>>>> | N  | NULL  | C| D   | 1549371226082 |
>>>> | N  | NULL  | C| D   | 1549805173512 |
>>>> ++---+--+-+---+
>>>>
>>>> I'm not sure this is relevant to it, but I have too many error logging
>>>> like this:
>>>>
>>>> ERROR 2019-02-10T22:47:28,178 (Job reset thread) - Exception tossed:
>>>> Unexpected job status encountered: 33
>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected
>>>> job status encountered: 33
>>>> at
>>>> org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2145)
>>>> ~[mcf-pull-agent.jar:?]
>>>> at
>>>> org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8608)
>>>> ~[mcf-pull-agent.jar:?]
>>>> at
>>>> org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:77)
>>>> [mcf-pull-agent.jar:?]
>>>> ERROR 2019-02-10T22:47:28,182 (Job reset thread) - Exception tossed:
>>>> Unexpected job status encountered: 33
>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected
>>>> job status encountered: 33
>>>> at
>>>> org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2145)
>>>> ~[mcf-pull-agent.jar:?]
>>>> at
>>>> org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8608)
>>>> ~[mcf-pull-agent.jar:?]
>>>> at
>>>> org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:77)
>>>> [mcf-pull-agent.jar:?]
>>>>
>>>>
>>>> Regards,
>>>> Cihad Güzel
>>>>
>>>

Job hang in aborting state for along time

2019-02-10 Thread Cihad Guzel

Hi,

I try external TIKA extractor. I have 4 continuously file crawler jobs. Two
of them have external tika extractor. One of them processed all documents
that is only 98 docs. The job is hanging in "Aborting" state when manually
abort. I waited more than 1 day and then the state changed.

How can I find the problem?

mysql> SELECT status, errortext, type, startmethod, id FROM jobs;
++---+--+-+---+
| status | errortext | type | startmethod | id|
++---+--+-+---+
| N  | NULL  | C| D   | 1549371059083 |
| X  | NULL  | C| D   | 1549371135463 |
| N  | NULL  | C| D   | 1549371226082 |
| N  | NULL  | C| D   | 1549805173512 |
++---+--+-+---+

I'm not sure this is relevant to it, but I have too many error logging like
this:

ERROR 2019-02-10T22:47:28,178 (Job reset thread) - Exception tossed:
Unexpected job status encountered: 33
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job
status encountered: 33
at
org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2145)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8608)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:77)
[mcf-pull-agent.jar:?]
ERROR 2019-02-10T22:47:28,182 (Job reset thread) - Exception tossed:
Unexpected job status encountered: 33
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job
status encountered: 33
at
org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2145)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8608)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:77)
[mcf-pull-agent.jar:?]


Regards,
Cihad Güzel

Re: Job error during WindowsShare repository connector indexation

2017-10-11 Thread Cihad Guzel

Hi Olivier,

Did you try to connect to samba server with any samba client app? Check
Iptables on your server. Can you stop iptables on ubuntu server? Maybe, you
can configure iptables.

Regards,
Cihad Guzel


2017-10-11 12:02 GMT+03:00 Olivier Tavard <olivier.tav...@francelabs.com>:

> Hi,
>
> I had this error during crawling a Samba hosted on Ubuntu Server :
> ERROR 2017-10-05 00:00:14,109 (Idle cleanup thread) - MCF|MCF-agent|apache.
> manifoldcf.crawlerthreads|Exception tossed: Service '_ANON_0' of type
> '_REPOSITORYCONNECTORPOOL_SmbFileShare' is not active
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Service
> '_ANON_0' of type '_REPOSITORYCONNECTORPOOL_SmbFileShare' is not active
> at org.apache.manifoldcf.core.lockmanager.BaseLockManager.
> updateServiceData(BaseLockManager.java:273)
> at org.apache.manifoldcf.core.lockmanager.LockManager.
> updateServiceData(LockManager.java:108)
> at org.apache.manifoldcf.core.connectorpool.ConnectorPool$
> Pool.pollAll(ConnectorPool.java:654)
> at org.apache.manifoldcf.core.connectorpool.ConnectorPool.
> pollAllConnectors(ConnectorPool.java:338)
> at org.apache.manifoldcf.crawler.repositoryconnectorpool.
> RepositoryConnectorPool.pollAllConnectors(RepositoryConnectorPool.java:
> 124)
> at org.apache.manifoldcf.crawler.system.IdleCleanupThread.run(
> IdleCleanupThread.java:68)
>
> I used MCF 2.8.1 on Debian 8 with Postgresql 9.5.3, Windows Share
> repository connector. The job was configured to process about 2 millions of
> files  (600 GB).
> For text extraction I used a Tika server (on the same server as MCF) and
> add the Tika external content extractor transformation connector into the
> job configuration.
> The error was present 9 hours after the job was launched. The status job
> still indicated that the job was running but there was only 1 document in
> the active column and the error above was repeated in the MCF log.
>
> Then I tried to launch the clean-lock.sh script and I obtained this error :
> WARN 2017-10-09 08:23:56,284 (Idle cleanup thread) - 
> MCF|MCF-agent|apache.manifoldcf.lock|Attempt
> to set file lock 'mcf/mcf_home/./syncharea/551/442/lock-_POOLTARGET__
> REPOSITORYCONNECTORPOOL_SmbFileShare.lock' failed: No such file or
> directory
> java.io.IOException: No such file or directory
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:1012)
> at org.apache.manifoldcf.core.lockmanager.FileLockObject.
> grabFileLock(FileLockObject.java:223)
> at org.apache.manifoldcf.core.lockmanager.FileLockObject.
> obtainGlobalWriteLockNoWait(FileLockObject.java:78)
> at org.apache.manifoldcf.core.lockmanager.LockObject.
> obtainGlobalWriteLock(LockObject.java:121)
> at org.apache.manifoldcf.core.lockmanager.LockObject.
> enterWriteLock(LockObject.java:74)
> at org.apache.manifoldcf.core.lockmanager.LockGate.
> enterWriteLock(LockGate.java:177)
> at org.apache.manifoldcf.core.lockmanager.BaseLockManager.
> enterWrite(BaseLockManager.java:1120)
> at org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterWriteLock(
> BaseLockManager.java:757)
> at org.apache.manifoldcf.core.lockmanager.LockManager.
> enterWriteLock(LockManager.java:302)
> at org.apache.manifoldcf.core.connectorpool.ConnectorPool$
> Pool.pollAll(ConnectorPool.java:585)
> at org.apache.manifoldcf.core.connectorpool.ConnectorPool.
> pollAllConnectors(ConnectorPool.java:338)
> at org.apache.manifoldcf.crawler.repositoryconnectorpool.
> RepositoryConnectorPool.pollAllConnectors(RepositoryConnectorPool.java:
> 124)
> at org.apache.manifoldcf.crawlerui.IdleCleanupThread.
> run(IdleCleanupThread.java:69)
> And the error was repeated indefinitely in the log.
>
> Did it mean that there was a problem with the syncharea folder at some
> point ?
>
> Thanks,
> Best regards,
>
> Olivier TAVARD
>



-- 
Cihad Güzel

email job is down

2017-04-28 Thread Cihad Guzel

Hi,

I have created an email job that runs continuously.  I have an error
downing my job for a reason. Because it is a job that needs to be
continuously run, errors should be handled  and the job shouldn't be
downed. Is there anything we can do about it?

 My error as follow:

ERROR 2017-04-25T14:25:44,475 (Seeding thread) - Exception tossed: Error
finding emails: * BYE JavaMail Exception: java.io.IOException: Connection
dropped by server?
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Error finding
emails: * BYE JavaMail Exception: java.io.IOException: Connection dropped
by server?
at
org.apache.manifoldcf.crawler.connectors.email.EmailConnector.handleMessagingException(EmailConnector.java:1721)
~[?:?]
at
org.apache.manifoldcf.crawler.connectors.email.EmailConnector.addSeedDocuments(EmailConnector.java:335)
~[?:?]
at
org.apache.manifoldcf.crawler.system.SeedingThread.run(SeedingThread.java:150)
[classes/:?]
Caused by: javax.mail.MessagingException: * BYE JavaMail Exception:
java.io.IOException: Connection dropped by server?
at com.sun.mail.imap.IMAPFolder.open(IMAPFolder.java:961)
~[mail-1.4.5.jar:1.4.5]
at
org.apache.manifoldcf.crawler.connectors.email.EmailSession.openFolder(EmailSession.java:99)
~[?:?]
at
org.apache.manifoldcf.crawler.connectors.email.EmailConnector$OpenFolderThread.run(EmailConnector.java:1981)
~[?:?]
Caused by: com.sun.mail.iap.ConnectionException: * BYE JavaMail Exception:
java.io.IOException: Connection dropped by server?
at com.sun.mail.iap.Protocol.handleResult(Protocol.java:356)
~[mail-1.4.5.jar:1.4.5]
at com.sun.mail.imap.protocol.IMAPProtocol.examine(IMAPProtocol.java:886)
~[mail-1.4.5.jar:1.4.5]
at com.sun.mail.imap.IMAPFolder.open(IMAPFolder.java:925)
~[mail-1.4.5.jar:1.4.5]
at
org.apache.manifoldcf.crawler.connectors.email.EmailSession.openFolder(EmailSession.java:99)
~[?:?]
at
org.apache.manifoldcf.crawler.connectors.email.EmailConnector$OpenFolderThread.run(EmailConnector.java:1981)
~[?:?]

-- 
Cihad Güzel
Regards

Email filtering does not work for Excahange Server

2017-04-23 Thread Cihad Guzel

Hi,

I try email connector for filtering. It is run successfully with gmail.
Hovewer, if I try it with exchange server, not indexed any email.

Do I have to set any configuration properties from "Server tab" of
Repository Connection of Email Connector for Exchange?

-- 
Cihad Güzel
Regards

manifoldcf build

2017-03-28 Thread Cihad Guzel

Hi,

I build trunk. I use "ant clean", "ant build", "ant make-deps" and "ant
make-core-deps"

There aren't any files in dist/example. Also, I couldn't find this
directories:

connector-libc
connector-common-lib
connector-lib-proprietary

Have you made any changes? How to build it?

-- 
Cihad Güzel

Re: Multilingual support with manifolds

2017-03-28 Thread Cihad Guzel

Hi Sreenivas,

If you mean something like the "language-identifer" plugin of Nutch ,
ManifoldCF does not have this kind of thing.

2017-03-28 14:23 GMT+03:00 Konrad Holl :

> Hi Sreenivas,
>
>
>
> the language support will only be relevant in the search engine itself
> (SharePoint). It will detect the languages and apply linguistic processing
> as needed during indexing and search time.
>
>
>
> -Konrad
>
>
>
> *From:* Karl Wright [mailto:daddy...@gmail.com]
> *Sent:* Dienstag, 28. März 2017 13:22
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Multilingual support with manifolds
>
>
>
> Hi,
>
>
>
> ManifoldCF uses utf-8 and binary throughout for its actual function, so it
> is not language specific in any way at that level.  Its UI has been
> localized (more or less) for four languages: English, Spanish, Japanese,
> and Chinese.
>
>
>
> Hope that helps,
>
> Karl
>
>
>
>
>
> On Tue, Mar 28, 2017 at 6:13 AM, Sreenivas.T  wrote:
>
> Hi,
>
>
>
> I'm new to manifold connector framework. I could not find documentation
> regarding multilingual support of sharepoint, email connectors & regular
> web crawlers. Please let me know if it has support to multilingual and if
> it has what are the languages that it support.
>
>
>
> I'm planning to use manifold cf instead of nutch for web crawling purposes
> too.
>
>
>
> Thanks,
>
> Sreenivas
>
>
>




Regards,
Cihad Güzel

Sharepoint authority

2017-03-20 Thread Cihad Guzel

Hi,

I want to use Sharepoint connector with "Active Directory". Do I have to
create "Active Directory" & "Sharepoint Native Authority" or only Active
Directory?

-- 
Teşekkürler
Cihad Güzel

Re: SharePoint crawler ArrayIndexOutOfBoundException in log

2017-03-17 Thread Cihad Guzel

Hi,

I use oracle jdk1.8.0_77 . I will try new http client version and return to
you.

Thanks
Cihad Güzel


2017-03-17 23:38 GMT+03:00 Markus Schuch <markus_sch...@web.de>:

> Hi,
>
> i think this may caused by
>
>   https://issues.apache.org/jira/browse/HTTPCLIENT-1715
>
> which was fixed in httpclient 4.5.2
>
> There is a very similar stacktrace in
>
>   https://issues.apache.org/jira/browse/HTTPCLIENT-1686
>
> which is also linked to HTTPCLIENT-1715.
>
> Cheers,
> Markus
>
> Am 17.03.2017 um 19:27 schrieb Karl Wright:
> > Hi Cihad,
> >
> > There are NTLMEngineImpl tests that exercise precisely the case that is
> > failing.  I'm therefore becoming convinced that there is something very
> > odd about your installation.  Are you using a non-standard JVM, for
> > instance?
> >
> > Karl
> >
> >
> > On Fri, Mar 17, 2017 at 10:28 AM, Karl Wright <daddy...@gmail.com
> > <mailto:daddy...@gmail.com>> wrote:
> >
> > Hi Cihad,
> >
> > Could you also check out and build the latest 4.5.x httpclient, from
> > this branch?
> >
> > https://svn.apache.org/repos/asf/httpcomponents/httpclient/
> branches/pull-66
> > <https://svn.apache.org/repos/asf/httpcomponents/httpclient/
> branches/pull-66>
> >
> > You will need maven for this but otherwise you can build it any way
> > you like.  Replace the "httpclient-4.5.1.jar" in the lib directory
> > with the jar you build, and then you can rebuild MCF.  See if you
> > still get the error.  If you do, it should be possible to chase it
> > down more readily.
> >
> > Thanks,
> > Karl
> >
> >
> > On Fri, Mar 17, 2017 at 9:57 AM, Cihad Guzel <cguz...@gmail.com
> > <mailto:cguz...@gmail.com>> wrote:
> >
> > No. I don't use any custom library.
> >
> > I try with manifoldcf trunk on my notebook. I install sharepoint
> > 2013 on ms server 2012 for testing with default configuration.
> >
> > 17 Mar 2017 16:05 tarihinde "Karl Wright" <daddy...@gmail.com
> > <mailto:daddy...@gmail.com>> yazdı:
> >
> > Hmm, I can see no way this can happen.  Are you by any
> > chance using a modified version of the HttpClient library?
> > Karl
> >
> >
> > On Fri, Mar 17, 2017 at 8:09 AM, Karl Wright
> > <daddy...@gmail.com <mailto:daddy...@gmail.com>> wrote:
> >
> > Hi Cihad,
> >
> > This is very interesting because the problem is coming
> > from Httpclient's NTLM engine.  The allocated packet
> > size for the Type 1 message is being exceeded, which I
> > didn't think was even possible.
> >
> > This may be a result of credentials that you have
> > supplied being strange in some way.  Let me look at the
> > Httpclient code and get back to you.
> >
> > Karl
> >
> >
> > On Fri, Mar 17, 2017 at 7:57 AM, Cihad Guzel
> > <cguz...@gmail.com <mailto:cguz...@gmail.com>> wrote:
> >
> > Hi,
> >
> > I try sharepoint connector with Active Directory in
> > debug mode. I saw ArrayIndexOutOfBoundException in
> > manifoldcf.log file. Any bugs?
> >
> > DEBUG 2017-03-17 14:30:48,386 (Worker thread '0') -
> > SharePoint: Getting version of '/Documents2//Step by
> > step Installation of SharePoint 2013 on Windows
> > Server 2012 R2 part 1 - SharePoint Community.pdf'
> > DEBUG 2017-03-17 14:30:48,466 (Worker thread '0') -
> > SharePoint: Checking whether to include document
> > '/Documents2/Step by step Installation of SharePoint
> > 2013 on Windows Server 2012 R2 part 1 - SharePoint
> > Community.pdf'
> > DEBUG 2017-03-17 14:30:48,466 (Worker thread '0') -
> > SharePoint: File '/Documents2/Step by step
> > Installation of SharePoint 2013 on Windows Server
> > 2012 R2 part 1 - SharePoint Community.pdf' exactly
> > matched rule path '/Documents2/*'
> > DEBUG 2017-

Re: SharePoint crawler ArrayIndexOutOfBoundException in log

2017-03-17 Thread Cihad Guzel

No. I don't use any custom library.

I try with manifoldcf trunk on my notebook. I install sharepoint 2013 on ms
server 2012 for testing with default configuration.

17 Mar 2017 16:05 tarihinde "Karl Wright" <daddy...@gmail.com> yazdı:

> Hmm, I can see no way this can happen.  Are you by any chance using a
> modified version of the HttpClient library?
> Karl
>
>
> On Fri, Mar 17, 2017 at 8:09 AM, Karl Wright <daddy...@gmail.com> wrote:
>
>> Hi Cihad,
>>
>> This is very interesting because the problem is coming from Httpclient's
>> NTLM engine.  The allocated packet size for the Type 1 message is being
>> exceeded, which I didn't think was even possible.
>>
>> This may be a result of credentials that you have supplied being strange
>> in some way.  Let me look at the Httpclient code and get back to you.
>>
>> Karl
>>
>>
>> On Fri, Mar 17, 2017 at 7:57 AM, Cihad Guzel <cguz...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I try sharepoint connector with Active Directory in debug mode. I saw
>>> ArrayIndexOutOfBoundException in manifoldcf.log file. Any bugs?
>>>
>>> DEBUG 2017-03-17 14:30:48,386 (Worker thread '0') - SharePoint: Getting
>>> version of '/Documents2//Step by step Installation of SharePoint 2013 on
>>> Windows Server 2012 R2 part 1 - SharePoint Community.pdf'
>>> DEBUG 2017-03-17 14:30:48,466 (Worker thread '0') - SharePoint: Checking
>>> whether to include document '/Documents2/Step by step Installation of
>>> SharePoint 2013 on Windows Server 2012 R2 part 1 - SharePoint Community.pdf'
>>> DEBUG 2017-03-17 14:30:48,466 (Worker thread '0') - SharePoint: File
>>> '/Documents2/Step by step Installation of SharePoint 2013 on Windows Server
>>> 2012 R2 part 1 - SharePoint Community.pdf' exactly matched rule path
>>> '/Documents2/*'
>>> DEBUG 2017-03-17 14:30:48,467 (Worker thread '0') - SharePoint:
>>> Including file '/Documents2/Step by step Installation of SharePoint 2013 on
>>> Windows Server 2012 R2 part 1 - SharePoint Community.pdf'
>>> DEBUG 2017-03-17 14:30:48,468 (Worker thread '0') - SharePoint: Finding
>>> metadata to include for document/item '/Documents2/Step by step
>>> Installation of SharePoint 2013 on Windows Server 2012 R2 part 1 -
>>> SharePoint Community.pdf'.
>>> DEBUG 2017-03-17 14:30:48,510 (Worker thread '0') - SharePoint: In
>>> getFieldValues; fieldNames=[Ljava.lang.String;@69f1a61a, site='',
>>> docLibrary='{1B694C45-DF1F-44E7-9814-F5096E85A126}',
>>> docId='/Documents2/Step by step Installation of SharePoint 2013 on Windows
>>> Server 2012 R2 part 1 - SharePoint Community.pdf', dspStsWorks=false
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: Getting
>>> version of '/Documents2//'
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint: Getting
>>> version of '/Documents2//CXFCA3100080010.pdf'
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint: Checking
>>> whether to include document '/Documents2/CXFCA3100080010.pdf'
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint: File
>>> '/Documents2/CXFCA3100080010.pdf' exactly matched rule path
>>> '/Documents2/*'
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint:
>>> Including file '/Documents2/CXFCA3100080010.pdf'
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: Checking
>>> whether to include library '/Documents2'
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint: Finding
>>> metadata to include for document/item '/Documents2/CXFCA3100080010.pdf'.
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: Library
>>> '/Documents2' partially matched file rule path '/Documents2/*' - including
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: Document
>>> identifier is a library: '/Documents2'
>>> DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: In
>>> getDocLibID; parentSite='', parentSiteDecoded='', docLibrary='Documents2'
>>> DEBUG 2017-03-17 14:30:48,540 (Worker thread '2') - SharePoint: Getting
>>> version of '/'
>>> DEBUG 2017-03-17 14:30:48,540 (Worker thread '2') - SharePoint: Checking
>>> whether to include site '/'
>>> DEBUG 2017-03-17 14:30:48,540 (Worker thread '2') - SharePoint: Site '/'
>>> partially matched file rule path '/Documents2/*' - including
>>> DEBUG 2017-03-17 14:30:48,548 (Worker thread '4') - SharePoint: In
>>> getFieldValues; f

SharePoint crawler ArrayIndexOutOfBoundException in log

2017-03-17 Thread Cihad Guzel

Hi,

I try sharepoint connector with Active Directory in debug mode. I saw
ArrayIndexOutOfBoundException in manifoldcf.log file. Any bugs?

DEBUG 2017-03-17 14:30:48,386 (Worker thread '0') - SharePoint: Getting
version of '/Documents2//Step by step Installation of SharePoint 2013 on
Windows Server 2012 R2 part 1 - SharePoint Community.pdf'
DEBUG 2017-03-17 14:30:48,466 (Worker thread '0') - SharePoint: Checking
whether to include document '/Documents2/Step by step Installation of
SharePoint 2013 on Windows Server 2012 R2 part 1 - SharePoint Community.pdf'
DEBUG 2017-03-17 14:30:48,466 (Worker thread '0') - SharePoint: File
'/Documents2/Step by step Installation of SharePoint 2013 on Windows Server
2012 R2 part 1 - SharePoint Community.pdf' exactly matched rule path
'/Documents2/*'
DEBUG 2017-03-17 14:30:48,467 (Worker thread '0') - SharePoint: Including
file '/Documents2/Step by step Installation of SharePoint 2013 on Windows
Server 2012 R2 part 1 - SharePoint Community.pdf'
DEBUG 2017-03-17 14:30:48,468 (Worker thread '0') - SharePoint: Finding
metadata to include for document/item '/Documents2/Step by step
Installation of SharePoint 2013 on Windows Server 2012 R2 part 1 -
SharePoint Community.pdf'.
DEBUG 2017-03-17 14:30:48,510 (Worker thread '0') - SharePoint: In
getFieldValues; fieldNames=[Ljava.lang.String;@69f1a61a, site='',
docLibrary='{1B694C45-DF1F-44E7-9814-F5096E85A126}',
docId='/Documents2/Step by step Installation of SharePoint 2013 on Windows
Server 2012 R2 part 1 - SharePoint Community.pdf', dspStsWorks=false
DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: Getting
version of '/Documents2//'
DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint: Getting
version of '/Documents2//CXFCA3100080010.pdf'
DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint: Checking
whether to include document '/Documents2/CXFCA3100080010.pdf'
DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint: File
'/Documents2/CXFCA3100080010.pdf' exactly matched rule path '/Documents2/*'
DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint: Including
file '/Documents2/CXFCA3100080010.pdf'
DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: Checking
whether to include library '/Documents2'
DEBUG 2017-03-17 14:30:48,539 (Worker thread '4') - SharePoint: Finding
metadata to include for document/item '/Documents2/CXFCA3100080010.pdf'.
DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: Library
'/Documents2' partially matched file rule path '/Documents2/*' - including
DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: Document
identifier is a library: '/Documents2'
DEBUG 2017-03-17 14:30:48,539 (Worker thread '5') - SharePoint: In
getDocLibID; parentSite='', parentSiteDecoded='', docLibrary='Documents2'
DEBUG 2017-03-17 14:30:48,540 (Worker thread '2') - SharePoint: Getting
version of '/'
DEBUG 2017-03-17 14:30:48,540 (Worker thread '2') - SharePoint: Checking
whether to include site '/'
DEBUG 2017-03-17 14:30:48,540 (Worker thread '2') - SharePoint: Site '/'
partially matched file rule path '/Documents2/*' - including
DEBUG 2017-03-17 14:30:48,548 (Worker thread '4') - SharePoint: In
getFieldValues; fieldNames=[Ljava.lang.String;@6f447d2e, site='',
docLibrary='{1B694C45-DF1F-44E7-9814-F5096E85A126}',
docId='/Documents2/CXFCA3100080010.pdf', dspStsWorks=false
DEBUG 2017-03-17 14:30:48,560 (Worker thread '2') - SharePoint: Document
identifier is a site: ''
DEBUG 2017-03-17 14:30:48,560 (Worker thread '2') - SharePoint: In
getSites; parentSite=''
DEBUG 2017-03-17 14:30:50,398 (Worker thread '4') - SharePoint: Got a
remote exception getting field values for site  library
{1B694C45-DF1F-44E7-9814-F5096E85A126} document
[/Documents2/CXFCA3100080010.pdf] - retrying
AxisFault
 faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
 faultSubcode:
 faultString: java.lang.ArrayIndexOutOfBoundsException: 41
 faultActor:
 faultNode:
 faultDetail:
{
http://xml.apache.org/axis/}stackTrace:java.lang.ArrayIndexOutOfBoundsException:
41
at
org.apache.http.impl.auth.NTLMEngineImpl$NTLMMessage.addByte(NTLMEngineImpl.java:911)
at
org.apache.http.impl.auth.NTLMEngineImpl$NTLMMessage.addULong(NTLMEngineImpl.java:941)
at
org.apache.http.impl.auth.NTLMEngineImpl$Type1Message.getResponse(NTLMEngineImpl.java:1043)
at
org.apache.http.impl.auth.NTLMEngineImpl.getType1Message(NTLMEngineImpl.java:148)
at
org.apache.http.impl.auth.NTLMEngineImpl.generateType1Msg(NTLMEngineImpl.java:1628)
at org.apache.http.impl.auth.NTLMScheme.authenticate(NTLMScheme.java:139)
at
org.apache.http.impl.auth.AuthSchemeBase.authenticate(AuthSchemeBase.java:138)
at
org.apache.http.impl.auth.HttpAuthenticator.doAuth(HttpAuthenticator.java:239)
at
org.apache.http.impl.auth.HttpAuthenticator.generateAuthResponse(HttpAuthenticator.java:202)
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:262)
at

MS Exhange support

2017-03-05 Thread Cihad Guzel

Hi,

Does MCF Email connector support Microsoft Exchange? It doesn't support as
much as I can see.

-- 
Regards
Cihad Güzel

Re: extract email attachment

2017-02-09 Thread Cihad Guzel

Thanks Karl.

Regards,
Cihad Guzel

2017-02-09 16:27 GMT+03:00 Karl Wright <daddy...@gmail.com>:

> Hi Cihad,
> The comparison should have been:
>
> mp.getCount() <= attachmentNumber
>
> As for changing ":" to "/", the real problem is that these should all be
> ":"'s, including line 678.  My apologies.  I've committed the changes.
>
> Thanks,
> Karl
>
>
> On Thu, Feb 9, 2017 at 8:15 AM, Cihad Guzel <cguz...@gmail.com> wrote:
>
>> Hi Karl,
>>
>> mp.getCount() is 2
>> and
>> attachmentNumber is '0' or '1' in my case.
>>
>> Regards,
>> Cihad Guzel
>>
>> 2017-02-09 16:07 GMT+03:00 Cihad Guzel <cguz...@gmail.com>:
>>
>>> Hi Karl,
>>>
>>> I made some changes in the code and then the indexing was done
>>> successfully.
>>>
>>> The changes are as follows:
>>>
>>> I have removed these lines (lines: 772-775):
>>>
>>>  if (mp.getCount() >= attachmentNumber) {
>>> activities.deleteDocument(documentIdentifier);
>>> continue;
>>>   }
>>>
>>> I updated these lines: (lines :1485 and 1586)
>>>   int index2 = di.indexOf("/", index1 + 1);
>>> as like:
>>>   int index2 = di.indexOf(":", index1 + 1);
>>>
>>> Regards,
>>> Cihad Guzel
>>>
>>>
>>>
>>>
>>> 2017-02-08 2:10 GMT+03:00 Karl Wright <daddy...@gmail.com>:
>>>
>>>> Hi Cihad,
>>>>
>>>> You need to set an attachment URL template for the attachments to be
>>>> crawled.  Open your email connection and click the "URL" tab, and you will
>>>> see the new field there.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Tue, Feb 7, 2017 at 6:07 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> Does not 'else' part has to be proccessed when the email has an
>>>>> attachment?
>>>>> Although the email has an attachment, only the first part was
>>>>> processed. Also, I don't see the attachment's content in solr index.
>>>>>
>>>>> I edited the code line for testing as follow:
>>>>>
>>>>>  if (attachmentIndex == null) {
>>>>>   // It's an email
>>>>>   System.out.println("running if block");
>>>>> ...
>>>>> } else {
>>>>>   System.out.println("running else block");
>>>>>       // It's an attachment
>>>>>   attachmentNumber = attachmentIndex;
>>>>> ...
>>>>> }
>>>>>
>>>>> Then, I run my job. It processed 3 times. The log looks as like:
>>>>>
>>>>> ...
>>>>> running if block
>>>>> running if block
>>>>> running if block
>>>>> ...
>>>>>
>>>>>
>>>>> The solr response:
>>>>>
>>>>> {
>>>>> "subject":["pdf test page"],
>>>>> "from":["Cihad Guzel <cguz...@gmail.com>"],
>>>>> "id":"http://sampleserver/%C4%B0%C5%9F%2FmyFolder%2Ftest?id=
>>>>> %3CCADNgPDgSXHeWo0GDnUL6S2sogUsXUa9mx2WxOT23Wi37Hog5Gw%40mai
>>>>> l.gmail.com%3E",
>>>>> "date":["Tue Feb 07 20:37:35 MSK 2017"],
>>>>> "mimetype":["",
>>>>>   ""],
>>>>> "created_date":"2017-02-07T17:37:35.000Z",
>>>>> "indexed_date":"2017-02-07T21:18:05.382Z",
>>>>> "to":["Cihad Guzel <cguz...@gmail.com>"],
>>>>> "modified_date":"2017-02-07T17:37:35.000Z",
>>>>> "encoding":["",
>>>>>   ""],
>>>>> "mime_type":"text/plain",
>>>>> "stream_size":["null"],
>>>>> "x_parsed_by":["org.apache.tika.parser.DefaultParser",
>>>>>   "org.apache.tika.parser.txt.TXTParser"],
>>>>> &

Re: extract email attachment

2017-02-09 Thread Cihad Guzel

Hi Karl,

mp.getCount() is 2
and
attachmentNumber is '0' or '1' in my case.

Regards,
Cihad Guzel

2017-02-09 16:07 GMT+03:00 Cihad Guzel <cguz...@gmail.com>:

> Hi Karl,
>
> I made some changes in the code and then the indexing was done
> successfully.
>
> The changes are as follows:
>
> I have removed these lines (lines: 772-775):
>
>  if (mp.getCount() >= attachmentNumber) {
> activities.deleteDocument(documentIdentifier);
> continue;
>   }
>
> I updated these lines: (lines :1485 and 1586)
>   int index2 = di.indexOf("/", index1 + 1);
> as like:
>   int index2 = di.indexOf(":", index1 + 1);
>
> Regards,
> Cihad Guzel
>
>
>
>
> 2017-02-08 2:10 GMT+03:00 Karl Wright <daddy...@gmail.com>:
>
>> Hi Cihad,
>>
>> You need to set an attachment URL template for the attachments to be
>> crawled.  Open your email connection and click the "URL" tab, and you will
>> see the new field there.
>>
>> Karl
>>
>>
>> On Tue, Feb 7, 2017 at 6:07 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>>
>>> Hi Karl,
>>>
>>> Does not 'else' part has to be proccessed when the email has an
>>> attachment?
>>> Although the email has an attachment, only the first part was processed.
>>> Also, I don't see the attachment's content in solr index.
>>>
>>> I edited the code line for testing as follow:
>>>
>>>  if (attachmentIndex == null) {
>>>   // It's an email
>>>   System.out.println("running if block");
>>> ...
>>> } else {
>>>   System.out.println("running else block");
>>>       // It's an attachment
>>>   attachmentNumber = attachmentIndex;
>>> ...
>>> }
>>>
>>> Then, I run my job. It processed 3 times. The log looks as like:
>>>
>>> ...
>>> running if block
>>> running if block
>>> running if block
>>> ...
>>>
>>>
>>> The solr response:
>>>
>>> {
>>> "subject":["pdf test page"],
>>> "from":["Cihad Guzel <cguz...@gmail.com>"],
>>> "id":"http://sampleserver/%C4%B0%C5%9F%2FmyFolder%2Ftest?id=
>>> %3CCADNgPDgSXHeWo0GDnUL6S2sogUsXUa9mx2WxOT23Wi37Hog5Gw%40mai
>>> l.gmail.com%3E",
>>> "date":["Tue Feb 07 20:37:35 MSK 2017"],
>>> "mimetype":["",
>>>   ""],
>>> "created_date":"2017-02-07T17:37:35.000Z",
>>> "indexed_date":"2017-02-07T21:18:05.382Z",
>>> "to":["Cihad Guzel <cguz...@gmail.com>"],
>>> "modified_date":"2017-02-07T17:37:35.000Z",
>>> "encoding":["",
>>>   ""],
>>> "mime_type":"text/plain",
>>> "stream_size":["null"],
>>> "x_parsed_by":["org.apache.tika.parser.DefaultParser",
>>>   "org.apache.tika.parser.txt.TXTParser"],
>>> "stream_content_type":["text/plain"],
>>> "content_encoding":["windows-1252"],
>>> "content_type":["text/plain; charset=windows-1252"],
>>> "content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n
>>>  --94eb2c1910841bc55f0547f43443\r\nContent-Type: multipart/alternative;
>>> boundary=94eb2c1910841bc5530547f43441\r\n\r\n--94eb2c1910841
>>> bc5530547f43441\r\nContent-Type: text/plain; charset=UTF-8\r\n\r\nthis
>>> is test mail for mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
>>> text/html; charset=UTF-8\r\n\r\nthis is test mail for
>>> mfc.\r\n\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--
>>> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf;
>>> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment;
>>> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding:
>>> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjY
>>> NJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9... ",
>>> "language":"en",
>>> "_version_":1558710621053124608}]
>>>   }
>>>
>>>
&g

Re: extract email attachment

2017-02-09 Thread Cihad Guzel

Hi Karl,

I made some changes in the code and then the indexing was done successfully.

The changes are as follows:

I have removed these lines (lines: 772-775):

 if (mp.getCount() >= attachmentNumber) {
activities.deleteDocument(documentIdentifier);
continue;
  }

I updated these lines: (lines :1485 and 1586)
  int index2 = di.indexOf("/", index1 + 1);
as like:
  int index2 = di.indexOf(":", index1 + 1);

Regards,
Cihad Guzel




2017-02-08 2:10 GMT+03:00 Karl Wright <daddy...@gmail.com>:

> Hi Cihad,
>
> You need to set an attachment URL template for the attachments to be
> crawled.  Open your email connection and click the "URL" tab, and you will
> see the new field there.
>
> Karl
>
>
> On Tue, Feb 7, 2017 at 6:07 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>
>> Hi Karl,
>>
>> Does not 'else' part has to be proccessed when the email has an
>> attachment?
>> Although the email has an attachment, only the first part was processed.
>> Also, I don't see the attachment's content in solr index.
>>
>> I edited the code line for testing as follow:
>>
>>  if (attachmentIndex == null) {
>>   // It's an email
>>   System.out.println("running if block");
>> ...
>> } else {
>>   System.out.println("running else block");
>>   // It's an attachment
>>   attachmentNumber = attachmentIndex;
>> ...
>> }
>>
>> Then, I run my job. It processed 3 times. The log looks as like:
>>
>> ...
>> running if block
>> running if block
>> running if block
>> ...
>>
>>
>> The solr response:
>>
>> {
>> "subject":["pdf test page"],
>> "from":["Cihad Guzel <cguz...@gmail.com>"],
>> "id":"http://sampleserver/%C4%B0%C5%9F%2FmyFolder%2Ftest?id=
>> %3CCADNgPDgSXHeWo0GDnUL6S2sogUsXUa9mx2WxOT23Wi37Hog5Gw%40mai
>> l.gmail.com%3E",
>> "date":["Tue Feb 07 20:37:35 MSK 2017"],
>> "mimetype":["",
>>   ""],
>> "created_date":"2017-02-07T17:37:35.000Z",
>> "indexed_date":"2017-02-07T21:18:05.382Z",
>> "to":["Cihad Guzel <cguz...@gmail.com>"],
>> "modified_date":"2017-02-07T17:37:35.000Z",
>> "encoding":["",
>>   ""],
>> "mime_type":"text/plain",
>> "stream_size":["null"],
>> "x_parsed_by":["org.apache.tika.parser.DefaultParser",
>>   "org.apache.tika.parser.txt.TXTParser"],
>> "stream_content_type":["text/plain"],
>> "content_encoding":["windows-1252"],
>> "content_type":["text/plain; charset=windows-1252"],
>> "content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n
>>  --94eb2c1910841bc55f0547f43443\r\nContent-Type: multipart/alternative;
>> boundary=94eb2c1910841bc5530547f43441\r\n\r\n--94eb2c1910841
>> bc5530547f43441\r\nContent-Type: text/plain; charset=UTF-8\r\n\r\nthis
>> is test mail for mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
>> text/html; charset=UTF-8\r\n\r\nthis is test mail for
>> mfc.\r\n\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--
>> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf;
>> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment;
>> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding:
>> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjY
>> NJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9... ",
>> "language":"en",
>> "_version_":1558710621053124608}]
>>   }
>>
>>
>>
>> 2017-02-08 1:17 GMT+03:00 Karl Wright <daddy...@gmail.com>:
>>
>>> Here's the full code for this class:
>>>
>>> https://svn.apache.org/repos/asf/manifoldcf/trunk/connectors
>>> /email/connector/src/main/java/org/apache/manifoldcf/crawler
>>> /connectors/email/EmailConnector.java
>>>
>>> Karl
>>>
>>>
>>> On Tue, Feb 7, 2017 at 5:14 PM, Karl Wright <daddy...@gmail.com> wrote:
>>>
>>>> Hi Cihad,
>>>>
>>>> The variable atta

Re: extract email attachment

2017-02-07 Thread Cihad Guzel

Hi Karl,

Does not 'else' part has to be proccessed when the email has an attachment?

Although the email has an attachment, only the first part was processed.
Also, I don't see the attachment's content in solr index.

I edited the code line for testing as follow:

 if (attachmentIndex == null) {
  // It's an email
  System.out.println("running if block");
...
} else {
  System.out.println("running else block");
  // It's an attachment
  attachmentNumber = attachmentIndex;
...
}

Then, I run my job. It processed 3 times. The log looks as like:

...
running if block
running if block
running if block
...


The solr response:

{
"subject":["pdf test page"],
"from":["Cihad Guzel <cguz...@gmail.com>"],
"id":"
http://sampleserver/%C4%B0%C5%9F%2FmyFolder%2Ftest?id=%3CCADNgPDgSXHeWo0GDnUL6S2sogUsXUa9mx2WxOT23Wi37Hog5Gw%40mail.gmail.com%3E
",
"date":["Tue Feb 07 20:37:35 MSK 2017"],
"mimetype":["",
  ""],
"created_date":"2017-02-07T17:37:35.000Z",
"indexed_date":"2017-02-07T21:18:05.382Z",
"to":["Cihad Guzel <cguz...@gmail.com>"],
"modified_date":"2017-02-07T17:37:35.000Z",
"encoding":["",
  ""],
"mime_type":"text/plain",
"stream_size":["null"],
"x_parsed_by":["org.apache.tika.parser.DefaultParser",
  "org.apache.tika.parser.txt.TXTParser"],
"stream_content_type":["text/plain"],
"content_encoding":["windows-1252"],
"content_type":["text/plain; charset=windows-1252"],
"content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n
 --94eb2c1910841bc55f0547f43443\r\nContent-Type: multipart/alternative;
boundary=94eb2c1910841bc5530547f43441\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
text/plain; charset=UTF-8\r\n\r\nthis is test mail for
mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: text/html;
charset=UTF-8\r\n\r\nthis is test mail for
mfc.\r\n\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--94eb2c1910841bc55f0547f43443\r\nContent-Type:
application/pdf; name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment;
filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding:
base64\r\nX-Attachment-Id:
f_iyvt78qa0\r\n\r\nJVBERi0xLjYNJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9...
",
"language":"en",
"_version_":1558710621053124608}]
  }



2017-02-08 1:17 GMT+03:00 Karl Wright <daddy...@gmail.com>:

> Here's the full code for this class:
>
> https://svn.apache.org/repos/asf/manifoldcf/trunk/
> connectors/email/connector/src/main/java/org/apache/
> manifoldcf/crawler/connectors/email/EmailConnector.java
>
> Karl
>
>
> On Tue, Feb 7, 2017 at 5:14 PM, Karl Wright <daddy...@gmail.com> wrote:
>
>> Hi Cihad,
>>
>> The variable attachmentIndex is *supposed* to be null except when an
>> attachment is being processed.  The code should look like this:
>>
>> if (attachmentIndex == null) {
>>   // It's an email
>> ...
>>     } else {
>>   // It's an attachment
>>   attachmentNumber = attachmentIndex;
>> ...
>> }
>>
>>
>> Karl
>>
>>
>> On Tue, Feb 7, 2017 at 4:43 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>>
>>> Hi Karl,
>>>
>>> I added LOG line for testing. It looks attachmentIndex is null.
>>>
>>> 2017-02-08 0:11 GMT+03:00 Karl Wright <daddy...@gmail.com>:
>>>
>>>> I attached a second patch (to apply on top of the first patch).  Please
>>>> let me know if that fixes the issue.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Tue, Feb 7, 2017 at 3:59 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> I have an error as follow:
>>>>>
>>>>> FATAL 2017-02-07 23:56:09,483 (Worker thread '29') - Error tossed: For
>>>>> input string: "myFolder/test:>>>> 0gdnul6s2sogusxua9mx2wxot23wi37hog...@mail.gmail.com>"
>>>>> java.lang.NumberFormatException: For input string: "myFolder/test:<
>>>>> cadngpdgsxhewo0gdnul6s2sogusxua9mx2wxot23wi37hog...@mail.gmail.com>"
>>>>> at java.lang.NumberFormatExceptio
&

Re: extract email attachment

2017-02-07 Thread Cihad Guzel

Hi Karl,

I added LOG line for testing. It looks attachmentIndex is null.

2017-02-08 0:11 GMT+03:00 Karl Wright <daddy...@gmail.com>:

> I attached a second patch (to apply on top of the first patch).  Please
> let me know if that fixes the issue.
>
> Karl
>
>
> On Tue, Feb 7, 2017 at 3:59 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>
>> Hi Karl,
>>
>> I have an error as follow:
>>
>> FATAL 2017-02-07 23:56:09,483 (Worker thread '29') - Error tossed: For
>> input string: "myFolder/test:> 0gdnul6s2sogusxua9mx2wxot23wi37hog...@mail.gmail.com>"
>> java.lang.NumberFormatException: For input string: "myFolder/test:<
>> cadngpdgsxhewo0gdnul6s2sogusxua9mx2wxot23wi37hog...@mail.gmail.com>"
>> at java.lang.NumberFormatException.forInputString(NumberFormatE
>> xception.java:65)
>> at java.lang.Integer.parseInt(Integer.java:580)
>> at java.lang.Integer.parseInt(Integer.java:615)
>> at org.apache.manifoldcf.crawler.connectors.email.EmailConnecto
>> r.processDocuments(EmailConnector.java:705)
>> at org.apache.manifoldcf.crawler.system.WorkerThread.run(Worker
>> Thread.java:399)
>>
>>
>> 2017-02-07 22:50 GMT+03:00 Cihad Guzel <cguz...@gmail.com>:
>>
>>> Thanks Karl,
>>>
>>> I will try it.
>>>
>>> Regards
>>> Cihad Guzel
>>>
>>> 2017-02-07 22:36 GMT+03:00 Karl Wright <daddy...@gmail.com>:
>>>
>>>> I've created a ticket and attached a patch to it.  CONNECTORS-1375.
>>>> Please let me know if it works for you; if not, I'll fix what doesn't work.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Tue, Feb 7, 2017 at 1:19 PM, Karl Wright <daddy...@gmail.com> wrote:
>>>>
>>>>> Correction: the only metadata attribute we set is the attachment(s)
>>>>> mimetype (as a multivalued field) -- this doesn't currently include the
>>>>> attachment data.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Tue, Feb 7, 2017 at 1:14 PM, Karl Wright <daddy...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Cihad,
>>>>>>
>>>>>> The email connector is providing the attachment data unextracted to
>>>>>> the output connector as metadata attribute data.  There are no
>>>>>> transformation connectors that look at this metadata.  Solr cell also
>>>>>> probably does not handle binary in random metadata attributes the proper
>>>>>> way.
>>>>>>
>>>>>> The connector's attachment code therefore seems to be designed only
>>>>>> to deal with textual attachments.  The right solution is to have 
>>>>>> individual
>>>>>> IDs for each attachment.  But that would also require there to be a URL 
>>>>>> we
>>>>>> could construct for each attachment.  We could provide an additional URI
>>>>>> template for attachments, but I'd wonder if your system has the ability 
>>>>>> to
>>>>>> serve attachments by their own URLs.  Please let me know if this would 
>>>>>> work
>>>>>> and if so I can create a ticket and work on making these changes.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <cguz...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I try the email connector with gmail. I attach the file [1] in my
>>>>>>> new email. And sent to my test email adress.
>>>>>>>
>>>>>>> My mail content body is like: "this is test mail for mfc"
>>>>>>>
>>>>>>> Then I run my email job and the email is indexed to Solr
>>>>>>> successfully. But, the solr's content field have not my attachment's
>>>>>>> content body. Solr content filed looks like:
>>>>>>>
>>>>>>> "content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n
>>>>>>>  --94eb2c1910841bc55f0547f43443\r\nContent-Type:
>>>>>>> multipart/alternative; boundary=94eb2c1910841bc553054
>>>>>>> 7f43441\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
>>>>>>> text/plain; charset=UTF-8\r\n\r\nthis is test mail for
>>>>>>> mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
>>>>>>> text/html; charset=UTF-8\r\n\r\nthis is test mail for
>>>>>>> mfc.\r\n\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--
>>>>>>> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf;
>>>>>>> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment;
>>>>>>> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding:
>>>>>>> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjY
>>>>>>> NJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDIwNTk3L08gNDA
>>>>>>> vRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgMTAwNSAyMTVdPj4NZW5kb2J
>>>>>>> qDSAgICAgICAgICAgICAgICAg\r\nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDA
>>>>>>> xNiAwMDAwMCBuDQowMDAwMDAxMzg2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM
>>>>>>> ..."
>>>>>>>
>>>>>>> Does the MFC email connector know that the attachment's file type is
>>>>>>> pdf? Does not extract the contents?
>>>>>>>
>>>>>>> [1] http://www.orimi.com/pdf-test.pdf
>>>>>>> --
>>>>>>> Regards
>>>>>>> Cihad Güzel
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Teşekkürler
>>> Cihad Güzel
>>>
>>
>>
>>
>> --
>> Teşekkürler
>> Cihad Güzel
>>
>
>


-- 
Teşekkürler
Cihad Güzel

Re: extract email attachment

2017-02-07 Thread Cihad Guzel

Hi Karl,

I have an error as follow:

FATAL 2017-02-07 23:56:09,483 (Worker thread '29') - Error tossed: For
input string: "myFolder/test:"
java.lang.NumberFormatException: For input string: "myFolder/test:<
cadngpdgsxhewo0gdnul6s2sogusxua9mx2wxot23wi37hog...@mail.gmail.com>"
at java.lang.NumberFormatException.forInputString(
NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.manifoldcf.crawler.connectors.email.EmailConnector.
processDocuments(EmailConnector.java:705)
at org.apache.manifoldcf.crawler.system.WorkerThread.run(
WorkerThread.java:399)


2017-02-07 22:50 GMT+03:00 Cihad Guzel <cguz...@gmail.com>:

> Thanks Karl,
>
> I will try it.
>
> Regards
> Cihad Guzel
>
> 2017-02-07 22:36 GMT+03:00 Karl Wright <daddy...@gmail.com>:
>
>> I've created a ticket and attached a patch to it.  CONNECTORS-1375.
>> Please let me know if it works for you; if not, I'll fix what doesn't work.
>>
>> Karl
>>
>>
>> On Tue, Feb 7, 2017 at 1:19 PM, Karl Wright <daddy...@gmail.com> wrote:
>>
>>> Correction: the only metadata attribute we set is the attachment(s)
>>> mimetype (as a multivalued field) -- this doesn't currently include the
>>> attachment data.
>>>
>>> Karl
>>>
>>>
>>> On Tue, Feb 7, 2017 at 1:14 PM, Karl Wright <daddy...@gmail.com> wrote:
>>>
>>>> Hi Cihad,
>>>>
>>>> The email connector is providing the attachment data unextracted to the
>>>> output connector as metadata attribute data.  There are no transformation
>>>> connectors that look at this metadata.  Solr cell also probably does not
>>>> handle binary in random metadata attributes the proper way.
>>>>
>>>> The connector's attachment code therefore seems to be designed only to
>>>> deal with textual attachments.  The right solution is to have individual
>>>> IDs for each attachment.  But that would also require there to be a URL we
>>>> could construct for each attachment.  We could provide an additional URI
>>>> template for attachments, but I'd wonder if your system has the ability to
>>>> serve attachments by their own URLs.  Please let me know if this would work
>>>> and if so I can create a ticket and work on making these changes.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I try the email connector with gmail. I attach the file [1] in my new
>>>>> email. And sent to my test email adress.
>>>>>
>>>>> My mail content body is like: "this is test mail for mfc"
>>>>>
>>>>> Then I run my email job and the email is indexed to Solr successfully.
>>>>> But, the solr's content field have not my attachment's content body. Solr
>>>>> content filed looks like:
>>>>>
>>>>> "content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n
>>>>>  --94eb2c1910841bc55f0547f43443\r\nContent-Type:
>>>>> multipart/alternative; boundary=94eb2c1910841bc553054
>>>>> 7f43441\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
>>>>> text/plain; charset=UTF-8\r\n\r\nthis is test mail for
>>>>> mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
>>>>> text/html; charset=UTF-8\r\n\r\nthis is test mail for
>>>>> mfc.\r\n\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--
>>>>> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf;
>>>>> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment;
>>>>> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding:
>>>>> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjY
>>>>> NJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDIwNTk3L08gNDA
>>>>> vRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgMTAwNSAyMTVdPj4NZW5kb2J
>>>>> qDSAgICAgICAgICAgICAgICAg\r\nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDA
>>>>> xNiAwMDAwMCBuDQowMDAwMDAxMzg2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM
>>>>> ..."
>>>>>
>>>>> Does the MFC email connector know that the attachment's file type is
>>>>> pdf? Does not extract the contents?
>>>>>
>>>>> [1] http://www.orimi.com/pdf-test.pdf
>>>>> --
>>>>> Regards
>>>>> Cihad Güzel
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Teşekkürler
> Cihad Güzel
>



-- 
Teşekkürler
Cihad Güzel

Re: extract email attachment

2017-02-07 Thread Cihad Guzel

Thanks Karl,

I will try it.

Regards
Cihad Guzel

2017-02-07 22:36 GMT+03:00 Karl Wright <daddy...@gmail.com>:

> I've created a ticket and attached a patch to it.  CONNECTORS-1375.
> Please let me know if it works for you; if not, I'll fix what doesn't work.
>
> Karl
>
>
> On Tue, Feb 7, 2017 at 1:19 PM, Karl Wright <daddy...@gmail.com> wrote:
>
>> Correction: the only metadata attribute we set is the attachment(s)
>> mimetype (as a multivalued field) -- this doesn't currently include the
>> attachment data.
>>
>> Karl
>>
>>
>> On Tue, Feb 7, 2017 at 1:14 PM, Karl Wright <daddy...@gmail.com> wrote:
>>
>>> Hi Cihad,
>>>
>>> The email connector is providing the attachment data unextracted to the
>>> output connector as metadata attribute data.  There are no transformation
>>> connectors that look at this metadata.  Solr cell also probably does not
>>> handle binary in random metadata attributes the proper way.
>>>
>>> The connector's attachment code therefore seems to be designed only to
>>> deal with textual attachments.  The right solution is to have individual
>>> IDs for each attachment.  But that would also require there to be a URL we
>>> could construct for each attachment.  We could provide an additional URI
>>> template for attachments, but I'd wonder if your system has the ability to
>>> serve attachments by their own URLs.  Please let me know if this would work
>>> and if so I can create a ticket and work on making these changes.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I try the email connector with gmail. I attach the file [1] in my new
>>>> email. And sent to my test email adress.
>>>>
>>>> My mail content body is like: "this is test mail for mfc"
>>>>
>>>> Then I run my email job and the email is indexed to Solr successfully.
>>>> But, the solr's content field have not my attachment's content body. Solr
>>>> content filed looks like:
>>>>
>>>> "content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n
>>>>  --94eb2c1910841bc55f0547f43443\r\nContent-Type:
>>>> multipart/alternative; boundary=94eb2c1910841bc553054
>>>> 7f43441\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
>>>> text/plain; charset=UTF-8\r\n\r\nthis is test mail for
>>>> mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: text/html;
>>>> charset=UTF-8\r\n\r\nthis is test mail for
>>>> mfc.\r\n\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--
>>>> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf;
>>>> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment;
>>>> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding:
>>>> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjY
>>>> NJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDIwNTk3L08gNDA
>>>> vRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgMTAwNSAyMTVdPj4NZW5kb2J
>>>> qDSAgICAgICAgICAgICAgICAg\r\nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDA
>>>> xNiAwMDAwMCBuDQowMDAwMDAxMzg2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM ..."
>>>>
>>>> Does the MFC email connector know that the attachment's file type is
>>>> pdf? Does not extract the contents?
>>>>
>>>> [1] http://www.orimi.com/pdf-test.pdf
>>>> --
>>>> Regards
>>>> Cihad Güzel
>>>>
>>>
>>>
>>
>


-- 
Teşekkürler
Cihad Güzel

Re: email connector filtering

2017-01-16 Thread Cihad Guzel

Hi Karl,

I looked the email connector code lines but the "date"  is not being used
for filtering. I think it should be added. I created an issue :
https://issues.apache.org/jira/browse/CONNECTORS-1368
and create a PR from github: https://github.com/apache/manifoldcf/pull/15

Regards
Cihad Guzel



2017-01-15 14:20 GMT+03:00 Cihad Guzel <cguz...@gmail.com>:

> Thanks Karl for your information.
>
> Regards
> Cihad Guzel
>
> 2017-01-14 23:41 GMT+03:00 Karl Wright <daddy...@gmail.com>:
>
>> Hi Cihad,
>>
>> The email connector uses the standard email java class to search.  Here's
>> the code for constructing that search:
>>
>> >>>>>>
>> SearchTerm searchTerm = null;
>>
>> Iterator<Map.Entry<String,String>> it =
>> findMap.entrySet().iterator();
>> while (it.hasNext()) {
>>   Map.Entry<String,String> pair = it.next();
>>   findParameterName = pair.getKey().toLowerCase(Locale.ROOT);
>>   findParameterValue = pair.getValue();
>>   if (Logging.connectors.isDebugEnabled())
>> Logging.connectors.debug("Email: Finding emails where '" +
>> findParameterName +
>> "' = '" + findParameterValue + "'");
>>   SearchTerm searchClause = null;
>>   if (findParameterName.equals(EmailConfig.EMAIL_SUBJECT)) {
>> searchClause = new SubjectTerm(findParameterValue);
>>   } else if (findParameterName.equals(EmailConfig.EMAIL_FROM)) {
>> searchClause = new FromStringTerm(findParameterValue);
>>   } else if (findParameterName.equals(EmailConfig.EMAIL_TO)) {
>> searchClause = new RecipientStringTerm(Message.RecipientType.TO,
>> findParameterValue);
>>   } else if (findParameterName.equals(EmailConfig.EMAIL_BODY)) {
>> searchClause = new BodyTerm(findParameterValue);
>>   }
>>
>>   if (searchClause != null)
>>   {
>> if (searchTerm == null)
>>   searchTerm = searchClause;
>> else
>>   searchTerm = new AndTerm(searchTerm, searchClause);
>>   }
>>   else
>>   {
>> Logging.connectors.warn("Email: Unknown filter parameter name:
>> '"+findParameterName+"'");
>>   }
>> }
>> <<<<<<
>>
>> So you construct a search as basically a set of AND clauses, where each
>> AND clause is either a "subject", "from", "to", or "body" match.  What the
>> email java class does with that search I am not sure; I'd play with it a
>> bit to see.
>>
>> Thanks,
>> Karl
>>
>>
>> On Sat, Jan 14, 2017 at 12:02 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>>
>>> Hi Karl,
>>>
>>> I try email connector. There are some filter field for email as from,
>>> to, body, subject, date. How does the filter works?  What should I write in
>>> this filter inputs, especially the date and body field?
>>>
>>> What is the pattern for the filter fields?
>>> Only year or timestamp or range for date field?
>>> Full text or regex or only one word for the another fields (to, from,
>>> subject, body) ?
>>>
>>> I haven't seen any documents related to this matter. Could you help me?
>>>
>>> --
>>> Thanks
>>> Cihad Guzel
>>>
>>
>>
>
>
> --
> Teşekkürler
> Cihad Güzel
>



-- 
Teşekkürler
Cihad Güzel

email connector filtering

2017-01-14 Thread Cihad Guzel

Hi Karl,

I try email connector. There are some filter field for email as from, to,
body, subject, date. How does the filter works?  What should I write in
this filter inputs, especially the date and body field?

What is the pattern for the filter fields?
Only year or timestamp or range for date field?
Full text or regex or only one word for the another fields (to, from,
subject, body) ?

I haven't seen any documents related to this matter. Could you help me?

-- 
Thanks
Cihad Guzel

Re: Sharepoint get ACL

2016-12-30 Thread Cihad Guzel

Hi Karl,

I have changed the authority group as Native Sharepoint instead of Active
Directory and I could see the allow tokens in Solr index as follow:

"allow_token_document":["Authority+Group:Ui%3A0%23.w%7Clagom%5Cadministrator",
  "Authority+Group:GExcel+Services+Viewers",
  "Authority+Group:GRestricted+Readers",
  "Authority+Group:Gtestsite+Members",
  "Authority+Group:GHierarchy+Managers",
  "Authority+Group:GApprovers",
  "Authority+Group:Gtestsite+Visitors",
  "Authority+Group:Gtestsite+Owners",
  "Authority+Group:GDesigners"],


If I select the "Active Directory" setting, I don't see any tokens.

"allow_token_document":["Authority+Group:"],

I tried user profile synchronization from Active Directory. I followed
https://blogs.technet.microsoft.com/meacoex/2013/08/04/step-by-step-active-directory-import-for-sharepoint-2013/
I could see all Active Directory users in Sharepoint. Then, I request
GetUserInfo and GetGroupCollectionFromUser in Sharepoint API via soapUI but
the Sid field is empty for all user. You can see the response as follow:


   



   
  
  
   


I created an issue on stackexchange. You can see from:
http://sharepoint.stackexchange.com/questions/203761/sid-have-empty-values-after-sharepoint-userprofile-sync
I can see the "sid" value, If I request to sharepoint rest api as like:
/_api/sp.userprofiles.peoplemanager/getuserprofilepropertyfor(accountname=@v,
propertyname='SID')?@v='testdomain\testUser'

The response:
http://schemas.microsoft.com/ado/2007/08/dataservices; xmlns:m="
http://schemas.microsoft.com/ado/2007/08/dataservices/metadata;
xmlns:georss="http://www.georss.org/georss; xmlns:gml="
http://www.opengis.net/gml
">S-1-5-21-151231991-263585328-740192949-1109

Then I saw a manifoldcf issue :
https://issues.apache.org/jira/browse/CONNECTORS-754 .  The issue was
resolved. But I'm having the same problem.







2016-12-28 14:41 GMT+03:00 Karl Wright <daddy...@gmail.com>:

> Hi Cihad,
>
> In your case, then, the connector is calling the "Users:
> GetUserCollectionFromGroup" SOAP method in the SharePoint API.  This
> method is supposed to list the users that belong to the group, but I
> suspect that your SharePoint instance is not set up to work in that way,
> and that you should in fact set your MCF up as follows:
>
> - Do NOT select the "Active directory" setting.  Use "claims-based"
> instead.
> - Use the appropriate SharePoint "native" authority.
>
> Read up on how to do that here:
>
> http://manifoldcf.apache.org/release/release-2.5/en_US/end-
> user-documentation.html#sharepointrepository
>
> Thanks,
> Karl
>
>
> On Wed, Dec 28, 2016 at 6:26 AM, Cihad Guzel <cguz...@gmail.com> wrote:
>
>> Hi Karl,
>>
>> I selected "Active Directory". My SharePoint server run with Active
>> Directory.
>>
>> 2016-12-28 14:13 GMT+03:00 Karl Wright <daddy...@gmail.com>:
>>
>>> Hi Cihad,
>>>
>>> The code for looking for document ACLs is as follows:
>>>
>>> >>>>>>
>>> Object node = nodeList.get( i );
>>> String mask = doc.getValue( node, "Mask" );
>>> long maskValue = new Long(mask).longValue();
>>> if ((maskValue & 1L) == 1L)
>>> {
>>>   // Permission to view
>>>   String isUser = doc.getValue( node, "MemberIsUser" );
>>>
>>>   if ( isUser.compareToIgnoreCase("True") == 0 )
>>>   {
>>> // Use AD user or group
>>> String userLogin = doc.getValue( node, "UserLogin" );
>>> String userSid = getSidForUser( userCall, userLogin,
>>> activeDirectoryAuthority );
>>> sids.add( userSid );
>>>   }
>>>   else
>>>   {
>>> // Role
>>> List roleSids;
>>> String roleName = doc.getValue( node, "RoleName" );
>>> if ( roleName.length() == 0)
>>> {
>>>   roleName = doc.getValue(node,"GroupName");
>>>   roleSids = getSidsForGroup(userCall, roleName,
>>> activeDirectoryAuthority);
>>> }
>>> else
>>> {
>>>   roleSids = getSidsForRole(userCall, roleName,
>>> activeDirectoryAuthority);
>>> }
>>>
>>&g

Sharepoint get ACL

2016-12-27 Thread Cihad Guzel

Hi,

I am trying MFC with Sharepoint 2013. First, I install the sharepoint
plugin and then run my job. My files in sharepoint are indexed successfully
to Solr. But I don't see the ACLs in solr index. You can see my sample solr
data as follow:

"filename":"Sample.doc",
"allow_token_document":["Authority+Group:"], "deny_token_document":[
"Authority+Group:DEAD_AUTHORITY"], "deny_token_parent":["__nosecurity__"], "
allow_token_share":["__nosecurity__"], "allow_token_parent":[
"__nosecurity__"], "deny_token_share":["__nosecurity__"],

I run Sharepoint connector with debug mode. I follow Manifoldcf log but I
don't see any error in it. I can see "getDocumentACLs xml response:" in the
log as follow:


...
  
   ...



How do I follow a way to solve the problem?


-- 
Regards
Cihad Güzel

Re: Invalid date format for modified_date

2016-11-20 Thread Cihad Guzel

Thanks Karl.

--
Kind Regards
Cihad Güzel

2016-11-20 19:51 GMT+03:00 Karl Wright <daddy...@gmail.com>:

> I just committed the fix.
>
> Thanks!
> Karl
>
>
> On Sun, Nov 20, 2016 at 11:47 AM, Furkan KAMACI <furkankam...@gmail.com>
> wrote:
>
>> Hi Karl,
>>
>> I verify that modified date is not being sent as a valid ISO date. It has
>> such numbers prepended to it.
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>> On Sun, Nov 20, 2016 at 6:32 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>>
>>> Hi Karl,
>>>
>>> I have created a pull request on github. You can see the problem from
>>> here: https://github.com/apache/manifoldcf/pull/10/commits/6
>>> a71a44ead5507c00302cb3a0a6a96d2bd2a02ce
>>>
>>> 2016-11-20 19:25 GMT+03:00 Karl Wright <daddy...@gmail.com>:
>>>
>>>> The code for formatting a date is here:
>>>>
>>>> >>>>>>
>>>>   public static String formatISO8601Date(Date dateValue)
>>>>   {
>>>> java.text.DateFormat df = new 
>>>> java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'",
>>>> Locale.ROOT);
>>>> df.setTimeZone(TimeZone.getTimeZone("GMT"));
>>>> return df.format(dateValue);
>>>>   }
>>>>
>>>> <<<<<<
>>>>
>>>> The code that fills in the modified_date attribute in the Solr
>>>> connector is here:
>>>>
>>>> >>>>>>
>>>>   if ( modifiedDateAttributeName != null )
>>>>   {
>>>> Date date = document.getModifiedDate();
>>>> if ( date != null )
>>>> {
>>>>   outputDoc.addField( modifiedDateAttributeName,
>>>> DateParser.formatISO8601Date( date ) );
>>>> }
>>>>   }
>>>> <<<<<<
>>>>
>>>> I can't see any way that a number could be prepended to this.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Sun, Nov 20, 2016 at 11:08 AM, Cihad Guzel <cguz...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I use SharedDrive and Solr connector. I run my job and I have seen
>>>>> invalid date format in modified_date field in my Solr index.  Otherwise 
>>>>> the
>>>>> size metadata is mising in solr index. The invalid metadata  as
>>>>> follow: "modified_date": "102400 2016-11-17T13:21:22.163Z"
>>>>>
>>>>> "102400" is unnecessary value and it should not be added in
>>>>> modified_date. I found the problem and edited it. I will create a pull
>>>>> request on github.
>>>>>
>>>>> --
>>>>> Kind Regards
>>>>> Cihad Güzel
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Teşekkürler
>>> Cihad Güzel
>>>
>>
>>
>


-- 
Teşekkürler
Cihad Güzel

Re: Invalid date format for modified_date

2016-11-20 Thread Cihad Guzel

Hi Karl,

I have created a pull request on github. You can see the problem from here:
https://github.com/apache/manifoldcf/pull/10/commits/6a71a44ead5507c00302cb3a0a6a96d2bd2a02ce

2016-11-20 19:25 GMT+03:00 Karl Wright <daddy...@gmail.com>:

> The code for formatting a date is here:
>
> >>>>>>
>   public static String formatISO8601Date(Date dateValue)
>   {
> java.text.DateFormat df = new 
> java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'",
> Locale.ROOT);
> df.setTimeZone(TimeZone.getTimeZone("GMT"));
> return df.format(dateValue);
>   }
>
> <<<<<<
>
> The code that fills in the modified_date attribute in the Solr connector
> is here:
>
> >>>>>>
>   if ( modifiedDateAttributeName != null )
>   {
> Date date = document.getModifiedDate();
> if ( date != null )
> {
>   outputDoc.addField( modifiedDateAttributeName,
> DateParser.formatISO8601Date( date ) );
> }
>   }
> <<<<<<
>
> I can't see any way that a number could be prepended to this.
>
> Karl
>
>
> On Sun, Nov 20, 2016 at 11:08 AM, Cihad Guzel <cguz...@gmail.com> wrote:
>
>> Hi,
>>
>> I use SharedDrive and Solr connector. I run my job and I have seen
>> invalid date format in modified_date field in my Solr index.  Otherwise the
>> size metadata is mising in solr index. The invalid metadata  as
>> follow: "modified_date": "102400 2016-11-17T13:21:22.163Z"
>>
>> "102400" is unnecessary value and it should not be added in
>> modified_date. I found the problem and edited it. I will create a pull
>> request on github.
>>
>> --
>> Kind Regards
>> Cihad Güzel
>>
>
>


-- 
Teşekkürler
Cihad Güzel

Invalid date format for modified_date

2016-11-20 Thread Cihad Guzel

Hi,

I use SharedDrive and Solr connector. I run my job and I have seen invalid
date format in modified_date field in my Solr index.  Otherwise the size
metadata is mising in solr index. The invalid metadata  as
follow: "modified_date": "102400 2016-11-17T13:21:22.163Z"

"102400" is unnecessary value and it should not be added in modified_date.
I found the problem and edited it. I will create a pull request on github.

-- 
Kind Regards
Cihad Güzel

Are there any way to index metadata?

2016-11-14 Thread Cihad Guzel

Hi

I want to see resource type (sharpoint, database, file share etc.) for
facet search on my Solr.

I followed "Metadata Adjuster" topic in the documentation. I added a new
field to solr schema and reloaded solr core.



I can see the field from "Schema Browser Menu" in solr ui. Then, I added a
new transformation connection. I added a metadata on "Metadata Expressions"
tab of Job editing on manifoldcf ui as follow:

Parameter name=windows, expression= repositorytype

I run my job. But I didn't see my metadata field on solr response.

what is missing? Are there any different way to send resouce type to output
connector?


-- 
Thanks
Cihad Güzel

Re: I don't see my file content in solr index

2016-10-31 Thread Cihad Guzel

Hi Karl,

I try output connector with solr 4.4.0, solr 5.5.3 and solr 6.2.1 .

If I use solr 4.4.0, everything looks OK in Simple History and in Solr
index.
If I use solr 5.5.3 and solr 6.2.1, everything looks OK in Simple History,
but I don't see my files content in solr index. I could see another fields
(mimetype, contenttype etc.).

I followed http network using wireshark. I have seen the content from it.
So, the problem can be on the Solr side. But I doubt that manifoldcf
support solr 5.x and solr 6.x. Could I need another setting for solr 5.x
and 6.x?

Thanks,
Cihad Guzel

2016-08-23 4:19 GMT+03:00 Karl Wright <daddy...@gmail.com>:

> Hi Cihad,
>
> When you say "the file is indexed successfully", what do you mean?  Do you
> mean that the Simple History shows a successful index attempt for the PDF
> file in question?  Or does it show that the document was rejected for some
> reason?
>
> If everything looks OK in the Simple History, then the problem has to be
> on the Solr side.  Please look at the Solr logs to see if the document was
> sent in, and what Solr did with it.
>
> Thanks,
> Karl
>
>
> On Mon, Aug 22, 2016 at 12:54 PM, Cihad Guzel <cguz...@gmail.com> wrote:
>
>> Hi
>>
>> I am new for ManifoldCF. I have defined a job and run it. The file is
>> indexed successfully. All metadata is indexed, but I don't see the pdf file
>> content in solr index. What could been the reason?
>>
>> Thanks
>> Cihad Guzel
>>
>
>

-- 
Teşekkürler
Cihad Güzel

52 matches

Mail list logo