Re: [Dspace-tech] Discovery Indexing questions

2015-01-13 Thread Hilton Gibson
On 14 January 2015 at 01:23, Monika C. Mevenkamp moni...@princeton.edu
wrote:

 Is it true that the database browse indexes become obsolete when using
 discovery/solr ?

​Hi Monica​

​Yes. See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Browse_Indexes​

Cheers

hg


*Hilton Gibson*
Ubuntu Linux Systems Administrator
JS Gericke Library
Room 1025C
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758
--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Export Import using SAF technique (Nada Abo Eita)

2015-01-13 Thread IdeaFix
I think it is bad idea to migrate using SAF. We have do it with our archive
and we found, that this way does not save bitstream order! For example, you
have an item http://yourreponame.org/bitstream/123456789/5/file.pdf when you
export and import again, you will get
http://yourNEWreponame.org/bitstream/123456789/1/file.pdf not /5/ but /1/!
And if you have more than 1 documents in Item, or you have upload one item
several times (for example to upgrade the file), you will get a broken link
in search engines caches and in all cases of using this link in other
resources.

 

It is difficult for me to describe it in English, but if you are interested
in my experience, you can write me a letter and I try to give you some
suitable information.

--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Help on appending files to POST_ITEM requests to REST-API

2015-01-13 Thread Peter Dietz
Hey Bruno,

Sorry for the delay, very busy.

Also, sorry that the documentation for DSpace 5 REST API isn't complete
yet. Feel free to contribute as you discover how things work.


It looks like you need to first create the item, as you are currently doing
it:
POST /collections/{collectionID}/items

(Don't post the bitstream, only the item metadata is respected, item
bitstream is ignored)

That returns you an Item, and (including its ID).

Then, you can pass the data of your file to
POST /items/{itemID}/bitstreams

Reading the code might be necessary, until the documentation is fully
comprehensive.
https://github.com/DSpace/DSpace/blob/master/dspace-rest/src/main/java/org/dspace/rest/ItemsResource.java#L411


Peter Dietz
Longsight
www.longsight.com
pe...@longsight.com
p: 740-599-5005 x809

On Mon, Jan 12, 2015 at 7:36 AM, Bruno Zanette brunonzane...@gmail.com
wrote:

 anyone?

 2015-01-09 12:35 GMT-02:00 Bruno Zanette brunonzane...@gmail.com:
  Hi, I'm trying to upload files via REST-API but it is not working.
 
  In the way i'm doing the requests the items are being successfully
  created but without any file.
 
  To do the tests i'm using Curl with the following options:
 
  curl -v -k -i -4 \
-H Content-Type: application/zip \
-H rest-dspace-token:  \
-X POST https://localhost:8443/rest/collections/1/items \
-d...@request.xml
 
  The content of file request.xml is the end of the message. I wrote
  it based on the result of a GET_ITEM request of an item submitted via
  XMLUI API.
 
   I'm using Dspace-master branch (5.0 Rc3), installed of Ubuntu 14.04.
 
  Does anyone knows how to do it?
  Am I doing the request correctly?
  Any tips?
 
  I've already tried to send zip files but it fails with a message of
  Unsupported media type...
 
  Thanks!
 
 
 
  item
  nameTITLE/name
  typeitem/type
  archivedtrue/archived
 
  bitstreams
  id1/id
  nameservicedocument/name
  typebitstream/type
  bundleNameORIGINAL/bundleName
  checkSum
 checkSumAlgorithm=MD563634883c3cc2b837895c3b8bda9e815/checkSum
  descriptionSERVICE_DOCUMENT/description
  formatUnknown/format
  mimeTypeapplication/octet-stream/mimeType
  retrieveLink/bitstreams/1/retrieve/retrieveLink
  sequenceId1/sequenceId
  sizeBytes2758/sizeBytes
  /bitstreams
 
  lastModified2015-01-08 14:54:48.816/lastModified
  metadatakeydc.contributor.author/keyvalueULTIMO,
  PRIMEIRO/value/metadata
  metadatakeydc.date.issued/keyvalue1234-12-12/value/metadata
 
 metadatakeydc.identifier.citation/keylanguagept_BR/languagevalueCITATION/value/metadata
 
 metadatakeydc.description/keylanguagept_BR/languagevalueDESCRICAO/value/metadata
 
 metadatakeydc.description.abstract/keylanguagept_BR/languagevalueABSTRACT/value/metadata
 
 metadatakeydc.language.iso/keylanguagept_BR/languagevaluept_BR/value/metadata
 
 metadatakeydc.publisher/keylanguagept_BR/languagevaluePUBLISHER/value/metadata
 
 metadatakeydc.title/keylanguagept_BR/languagevalueTITULO/value/metadata
 
 metadatakeydc.title.alternative/keylanguagept_BR/languagevalueOUTROS_TITULOS/value/metadata
 
 metadatakeydc.type/keylanguagept_BR/languagevalueDataset/value/metadata
  withdrawnfalse/withdrawn
  /item
 
 
  --
  Bruno Nocera Zanette
  +55 41 9992-2508



 --
 Bruno Nocera Zanette
 +55 41 9992-2508


 --
 New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in Ashburn.
 Choose from 2 high performing configs, both with 100TB of bandwidth.
 Higher redundancy.Lower latency.Increased capacity.Completely compliant.
 vanity: www.gigenet.com
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 List Etiquette:
 https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Pull Requests

2015-01-13 Thread Christian Scheible
Thanks both of you. I am going to improve it with my next pull request. 
I did the recreation of the master branch and will use pull instead of 
fetch.

Regards Christian

Am 13.01.2015 um 08:59 schrieb Àlex Magaz Graça:
 El 12/01/15 a les 15:55, Christian Scheible ha escrit:
 Hi Helix,

 I think the problem is that I messed up my first branch and merged it
 with the master. Because I am allways doing this:

 1. git checkout master
 2. git fetch upstream
 3. git branch DS-X
 4. git checkout DS-X
 5. make changes
 6. git commit
 7. git push origin

 or is the problem that I do git fetch upstream instead of git pull
 upstream master?

 I am going to include more information about the bugs and what the fixes
 are doing.

 Regards
 Christian

 Am 12.01.2015 um 15:20 schrieb helix84:
  Hi Christian,
 
  I'm looking at your PRs. Don't worry that they're not perfect, the
  important thing is that the code is out there. We can take an
  individual commit (git cherry-pick) from it. I was thinking what might
  have gone wrong on your end and my guess is that you're not starting
  from the master branch. Before you create a new branch, make sure to
  checkout master (git checkout master) and update it (git pull upstream
  master). Then you can create a new branch, which can be used as a
  source for PR that will contain only the relevant commit(s).
 
  I also looked briefly at the bugs. It would help to include more
  detail about the problem so that the fixes can be quickly tested. It
  will help to include detailed steps to reproduce, expected and actual
  behaviour.
 
  Thanks.
 
  Regards,
  ~~helix84
 
  Compulsory reading: DSpace Mailing List Etiquette
  https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
 
 



 Hi Christian,

 You are missing a 'git merge upstream/master' after step 2.
 Alternatively, you could also run git pull instead of steps 1 and 2, as
 it says in the Development with Git wiki page [1]:

 The pull command executes git fetch, which retrieves the actual 
 changes followed by git merge, putting the changes in your codebase.

 [1] 
 https://wiki.duraspace.org/display/DSPACE/Development+with+Git#DevelopmentwithGit-OverviewoftheGitLifecycle

 Cheers,
 Àlex




-- 
Christian Scheible
Softwareentwickler / Abt. Content-basierte Dienste
Kommunikations-, Informations- und Medienzentrum (KIM)
Universität Konstanz
78457 Konstanz
+49 (0)7531 / 88-2857
Raum B 703


--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Change Solr port from 8080 to 8081

2015-01-13 Thread helix84
Change the port number in the following properties and restart your
servlet container:

search.server in [dspace]/config/modules/discovery.cfg
solr.url in [dspace]/config/modules/oai.cfg
server in [dspace]/config/modules/solr-statistics.cfg
solr.authority.server in [dspace]/config/dspace.cfg (DSpace 5)


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


Re: [Dspace-tech] follow-up to #DS-1481 DS-1822

2015-01-13 Thread Hilton Gibson
Hi All

Just to add - our upgrade to 4.2 from 3.2 involved fixing a lot of
data-issued metadata.
The SOLR discovery date-issued index was a mess after the upgrade.
Updating the existing curation task to check metadata, with a date-issued
format would help a lot.

Cheers

hg

*Hilton Gibson*
Ubuntu Linux Systems Administrator
JS Gericke Library
Room 1025C
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758

On 13 January 2015 at 02:10, LiYu Lilly lill...@hotmail.com wrote:

 Dear all,

 1. A quick question related to Jira #DS-1481, in
 google-metadata.properties, shouldn't | dc.date.available |
 dc.date.accessioned be removed from

  # 42  google.citation_date = dc.date.copyright | dc.date.issued |
 dc.date.available | dc.date.accessionedsince dc.date.available and
 dc.date.accessioned should not be options for google.citation_date field?

 ( http://scholar.google.com/intl/en/scholar/inclusion.html#indexing
 http://scholar.google.com/intl/en/scholar/inclusion.html#indexing, )

 2. What's the status of DS-1822 http:jira.duraspace.org/browse/DS-1822
 ? Has anyone tested Richard's curation task tool?

 Thanks much,
 Lilly



 --
 New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in Ashburn.
 Choose from 2 high performing configs, both with 100TB of bandwidth.
 Higher redundancy.Lower latency.Increased capacity.Completely compliant.
 http://p.sf.net/sfu/gigenet
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 List Etiquette:
 https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] DSpace-CRIS 4.1.2 - Issue on items display in ResearcherProfile

2015-01-13 Thread Alexander Wong
Dear Pablo,

Thanks again.
It would be simply /jspui/cris/rp/rp5?open=journal
At the time being, I have force generating the anchor
/jspui/cris/rp/rp0005?open=journal#dspaceitems to make the jump succeed.

Best Regards,
Alexander Wong

On Tue, Jan 13, 2015 at 8:21 PM, Pablo Buenaposada 
pablo.buenapos...@csuc.cat wrote:

 About issue 1, yes, seems that's not working well, will look into it.
 Issue 2 it's strange, mine is working well, I don't know if I forget to
 tell
 you about more code fixes in other files, can't remember now...

 Tell me what url gives you in the red section of the image attached.
 http://dspace.2283337.n4.nabble.com/file/n4676176/untitled.png



 --
 View this message in context:
 http://dspace.2283337.n4.nabble.com/DSpace-CRIS-4-1-2-Issue-on-items-display-in-ResearcherProfile-tp4676116p4676176.html
 Sent from the DSpace - Tech mailing list archive at Nabble.com.


 --
 New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in Ashburn.
 Choose from 2 high performing configs, both with 100TB of bandwidth.
 Higher redundancy.Lower latency.Increased capacity.Completely compliant.
 http://p.sf.net/sfu/gigenet
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 List Etiquette:
 https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] follow-up to #DS-1481 DS-1822

2015-01-13 Thread helix84
On Tue, Jan 13, 2015 at 9:30 AM, Hilton Gibson hilton.gib...@gmail.com wrote:
 Updating the existing curation task to check metadata, with a date-issued
 format would help a lot.

Hi Hilton, see DS-1775.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


Re: [Dspace-tech] [Dspace-general] How tp define IP-based policy?

2015-01-13 Thread Monika C. Mevenkamp
Oliver 

look for the IP Authentication chapter in the DSPACE documentation for your 
version for details 

We use IP authentication so that the system will try it first - before it moves 
on to the next method: 

This is done in dspace/config/modules/authentication.cfg

plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
 org.dspace.authenticate.IPAuthentication, \
 org.dspace.authenticate.PasswordAuthentication


You also need to define a GROUP in the admin web ui - we define Princeton_IPs

In addition you need to define the IP range for that group, in our case we 
added 

ip.Princeton_IPs = 0, xx.yyy.zzz, aaa.bbb 

to dspace/config/modules/authentication-ip.cfg


Last you have to set the policies on your items and bitstreams as need using 
the Princeton_IPs group 

Monika


Monika Mevenkamp
phone: 609-258-4161
693 Alexander Road, Princeton University, Princeton, NJ 08544


On Jan 13, 2015, at 3:38 AM, Olivier Nicole olivier.nic...@cs.ait.ac.th wrote:

 Hi,
 
 DSpace 4.x documentation mentions:
 
 For example, it is not unusual for content destined for DSpace to come
 with permanent restrictions on use or access based on license-driven or
 other IP-based requirements that limit access to institutionally
 affiliated users. Restrictions such as these are imposed and managed
 using standard administrative tools in DSpace, typically by attaching
 specific policies to Items, Collections, Bitstreams, etc.
 
 But I cannot find the way to define such IP-based policies, while there
 are some bitstreams that I need to restrict to people browsing from
 withing our LAN.
 
 How can I do that?
 
 Best regards,
 
 Olivier
 -- 
 
 --
 New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in Ashburn.
 Choose from 2 high performing configs, both with 100TB of bandwidth.
 Higher redundancy.Lower latency.Increased capacity.Completely compliant.
 http://p.sf.net/sfu/gigenet
 ___
 Dspace-general mailing list
 dspace-gene...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-general


--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


Re: [Dspace-tech] DSpace-CRIS 4.1.2 - Issue on items display in ResearcherProfile

2015-01-13 Thread Pablo Buenaposada
About issue 1, yes, seems that's not working well, will look into it.
Issue 2 it's strange, mine is working well, I don't know if I forget to tell
you about more code fixes in other files, can't remember now...

Tell me what url gives you in the red section of the image attached.
http://dspace.2283337.n4.nabble.com/file/n4676176/untitled.png 



--
View this message in context: 
http://dspace.2283337.n4.nabble.com/DSpace-CRIS-4-1-2-Issue-on-items-display-in-ResearcherProfile-tp4676116p4676176.html
Sent from the DSpace - Tech mailing list archive at Nabble.com.

--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


Re: [Dspace-tech] Fwd: Null pointer exception while using configurable workflow, select Reviewer Step

2015-01-13 Thread helix84
I looked only briefly, the immediate problem seems to be that either the
eperson or (more likely) the workflow item is null:

https://github.com/DSpace/DSpace/blob/dspace-5.0-rc3/dspace-api/src/main/java/org/dspace/xmlworkflow/WorkflowRequirementsManager.java#L126

I don't know what it means in the bigger picture.

Meanwhile you should try following the new version of the docs:
https://wiki.duraspace.org/display/DSDOC5x/Configurable+Workflow

There have been changes to that page, at least in the way the extra
database tables are created is now different (DS-2243).


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Possible bug in restricted PDFs/ extracted text/ indexing

2015-01-13 Thread Tim Donohue
Hi Ryan (and all),

Had a moment this morning to dig a little deeper here...

 From what I can tell, it looks like this *may* be the result of a 
flaw/bug in the logic of the Discovery Access Rights Awareness feature 
(which is supposed to respect access restrictions on Items).

I believe what may be going on is the following:

1. Discovery sees the Item as being Anonymous READ so it makes the 
Item metadata searchable to anonymous users (which is appropriate in 
most scenarios obviously)

2. However, it looks like Discovery may not *check* to see if any Files 
(Bitstreams) are more tightly restricted. So, based on my skimming the 
code, it looks like it assumes that: If the Item is Anonymous READ, 
then all its Files should just be indexed  searchable. In your 
scenario, this is an obviously wrong assumption as it results in your 
restricted PDF being publicly searchable (and a snippet of that 
restricted PDFs text appears in the search results)

Again though, this is me just *skimming the code* (links below for 
interested developers). I might be misunderstanding something here.

I'm copying in @mire staff (since they helped build this new Discovery 
Access Rights Awareness feature into DSpace 4.x). Kevin or Bram, am I 
understanding the code here properly? Have you ever encountered this 
before or know of a workaround/fix?

Thanks,

Tim


Relevant Code Links:
-
* SolrServiceResourceRestrictionPlugin seems to be the class that access 
restricts certain objects in Discovery/Solr queries, but it only seems 
to be used at the ITEM level (and never for individual Bitstreams): 
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/SolrServiceResourceRestrictionPlugin.java

* SolrServiceImpl is what actually indexes the extracted TEXT 
bitstreams. But, from what I can tell, it NEVER checks to see if the 
extracted TEXT is access restricted (i.e. it just assumes the extracted 
text has the same permissions as the overall item). Here's that area of 
the code: 
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L1370



On 1/12/2015 3:59 PM, Steans, Ryan J wrote:
 Hi all,

 I mailed earlier today, but it doesn’t look like the mail went through.
 Apologies if you hear from me twice in one day on this same topic.

 Here’s our situation –

 We have a user placing PDFs into DSpace as part of a complex item with
 multiple files and file types.  She has set some of bitstreams to be
 hidden, but once the text has been extracted, despite the fact the
 actual PDF and TXT file are hidden, a search will turn up the extracted
 text.

 So – If the name “Ryan Steans” was in a PDF, but that PDF was hidden –
 the PDF might be hidden, but my search result in DSpace would turn up
 “Ryan Steans” and about 500 characters of text surround that name.

 Some additional details on our use case -

   * The item itself is public and is set to anonymous READ.
   * This particular item has 1 Mp4 and 3 PDF's as bitstreams, all except
 for 1 PDF are set to anonymous READ.
   * None of the bitstreams are set as the primary bitstream for the item.
   * the 1 PDF that is set to restricted READ is the one that the media
 filter is parsing and inserting into the fulltext value in solr...
 the other 2 PDF's are not being indexed as fulltext and their
 contents are not searchable through Discovery (or searchable in SOLR).
   * The generated TXT file from the PDF has the same permissions
 (restricted) as the original source bitstream.

 The problem is that if you happen to search for anything in the fulltext
 of the restricted item, it will show up in the results and the first
 ~500 chars of the parsed-restricted-text file are displayed in the
 search results.

 Looking to see if this is something anyone else has seen.

 Is this an indexing problem?  Have we found a bug?

 thanks

 *Ryan Steans*

 Director of Operations

 Texas Digital Library

 512-495-4403

 Web: http://www.tdl.org/

 Twitter: @TxDigLibrary http://twitter.com/TXDigLibrary

 Facebook: http://www.facebook.com/texasdigitallibrary

 Join the e-mail list: http://tdl.org/news/newsletters/newsletter-signup/

 **



 --
 New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in Ashburn.
 Choose from 2 high performing configs, both with 100TB of bandwidth.
 Higher redundancy.Lower latency.Increased capacity.Completely compliant.
 www.gigenet.com



 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 List Etiquette: 
 https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


--
New Year. New Location. New 

[Dspace-tech] Discovery Indexing questions

2015-01-13 Thread Monika C. Mevenkamp
After some more investigation I found that all docs in my solr index with
search.resourcetype:2
are also
discoverable:false

I take it that those are the docs corresponding to dspace items.

What determines an item’s “discoverable” setting ?

We have a mix of openly accessible and restricted content and want all metadata 
indexed. We may want to turn off full text indexing completely.  We definitely 
want to exclude documents in a particular community from full text indexing or 
at least we want to make sure that the information from full text indexing is 
only shown to site visitors with the right credentials;
Is /dspace/bin/dspace index-discovery the way to go ?
Can its remove option be applied to communities / collections ?
Does that make the enclosed collections and  items invisible to all site 
visitors - including the admin user  ?

Is it true that the database browse indexes become obsolete when using 
discovery/solr ?

Some of the answers may be in the documentation - please point me there if I 
did overlook them


Monika



Monika Mevenkamp
phone: 609-258-4161
693 Alexander Road, Princeton University, Princeton, NJ 08544


On Jan 12, 2015, at 12:20 PM, Monika C. Mevenkamp 
moni...@princeton.edumailto:moni...@princeton.edu wrote:


I upgraded to 4.0 (from 1.8), ran dspace index-discovery, can see lots of docs 
in the solar admin ui.

Running a query

/solr/search/select?q=search.resourcetype%3A2+AND+search.resourceid%3A1wt=javabinversion=2

gives a couple results, yet when I click on browse I get ‘No Entries in Index’ 
in the JSPUI.

The log below suggests otherwise as well.

What might be going wrong ?

Monika



2015-01-12 12:10:08,327 DEBUG org.dspace.app.webui.servlet.DSpaceServlet @ 
anonymous:session_id=B034BADCB247333CEA51924C910AB7DF:ip_addr=0:0:0:0:0:0:0:1:http_request:--
 URL Was\colon; http\colon;//localhost\colon;8080/jspui/browse?type=dateissued
-- Method\colon; GET
-- Parameters were\colon;
-- type\colon; dateissued

2015-01-12 12:10:08,331 INFO  
org.dspace.app.webui.servlet.AbstractBrowserServlet @ 
anonymous:session_id=B034BADCB247333CEA51924C910AB7DF:ip_addr=0:0:0:0:0:0:0:1:browse:type=dateissued,order=DESC,value=null,month=null,year=null,starts_with=null,vfocus=null,focus=-1,rpp=20,sort_by=2,community=n/a,collection=n/a,level=0,etal=-1
2015-01-12 12:10:08,335 DEBUG org.dspace.browse.BrowseEngine @ 
anonymous:session_id=B034BADCB247333CEA51924C910AB7DF:ip_addr=0:0:0:0:0:0:0:1:browse:
2015-01-12 12:10:08,335 INFO  org.dspace.browse.BrowseEngine @ 
anonymous:session_id=B034BADCB247333CEA51924C910AB7DF:ip_addr=0:0:0:0:0:0:0:1:browse_by_item:
2015-01-12 12:10:08,335 DEBUG org.dspace.browse.BrowseEngine @ 
anonymous:session_id=B034BADCB247333CEA51924C910AB7DF:ip_addr=0:0:0:0:0:0:0:1:get_total_results:distinct=false
2015-01-12 12:10:08,342 DEBUG org.dspace.discovery.SolrServiceImpl @ Solr URL: 
http://localhost:8080/solr/search
2015-01-12 12:10:08,469 DEBUG 
org.apache.http.impl.conn.PoolingClientConnectionManager @ Connection request: 
[route: {}-http://localhost:8080][total kept alive: 0; route allocated: 0 of 
32; total allocated: 0 of 128]
2015-01-12 12:10:08,484 DEBUG 
org.apache.http.impl.conn.PoolingClientConnectionManager @ Connection leased: 
[id: 0][route: {}-http://localhost:8080][total kept alive: 0; route allocated: 
1 of 32; total allocated: 1 of 128]
2015-01-12 12:10:08,485 DEBUG 
org.apache.http.impl.conn.DefaultClientConnectionOperator @ Connecting to 
localhost:8080
2015-01-12 12:10:08,511 DEBUG org.apache.http.client.protocol.RequestAddCookies 
@ CookieSpec selected: best-match
2015-01-12 12:10:08,528 DEBUG org.apache.http.client.protocol.RequestAuthCache 
@ Auth cache not set in the context
2015-01-12 12:10:08,528 DEBUG 
org.apache.http.client.protocol.RequestTargetAuthentication @ Target auth 
state: UNCHALLENGED
2015-01-12 12:10:08,529 DEBUG 
org.apache.http.client.protocol.RequestProxyAuthentication @ Proxy auth state: 
UNCHALLENGED
2015-01-12 12:10:08,529 DEBUG 
org.apache.http.impl.client.SystemDefaultHttpClient @ Attempt 1 to execute 
request
2015-01-12 12:10:08,529 DEBUG org.apache.http.impl.conn.DefaultClientConnection 
@ Sending request: GET 
/solr/search/select?q=search.resourcetype%3A2+AND+search.resourceid%3A1wt=javabinversion=2
 HTTP/1.1
2015-01-12 12:10:08,529 DEBUG org.apache.http.wire @  GET 
/solr/search/select?q=search.resourcetype%3A2+AND+search.resourceid%3A1wt=javabinversion=2
 HTTP/1.1[\r][\n]
2015-01-12 12:10:08,530 DEBUG org.apache.http.wire @  User-Agent: 
Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0[\r][\n]
2015-01-12 12:10:08,530 DEBUG org.apache.http.wire @  Host: 
localhost:8080[\r][\n]
2015-01-12 12:10:08,530 DEBUG org.apache.http.wire @  Connection: 
Keep-Alive[\r][\n]
2015-01-12 12:10:08,531 DEBUG org.apache.http.wire @  [\r][\n]
2015-01-12 12:10:08,531 DEBUG org.apache.http.headers @  GET 
/solr/search/select?q=search.resourcetype%3A2+AND+search.resourceid%3A1wt=javabinversion=2
 HTTP/1.1
2015-01-12