RE: Invenio Developer Forum (topics)

2014-07-15 Thread Pedro Miguel Paiva Gaudencio
Hey there!

Oh I think I told you that I wont be here, I'm taking holidays this Thursday 
until next week. But maybe Giorgos or Kamil can present it, they're quite 
familiarized with it by now! ;)

Cheers,
Pedro



From: Lars Holm Nielsen
Sent: Tuesday, July 15, 2014 12:54 PM
To: project-invenio-devel (Invenio developers mailing-list)
Subject: Invenio Developer Forum (topics)

Dear all,

Next weeks developer forum will be on metadata extraction. Kostas will present 
his initial work on extracting metadata from uploaded files. This is useful for 
making people type less when depositing files as well as for semi-automated 
curation.

@Pedro: will you also present the work you have been doing on CrossRef and 
arXiv metadata import?

Cheers,
Lars


--
Lars Holm Nielsen
CERN, IT Department, Collaboration  Information Services
http://zenodo.org | Tel: +41 22 76 79182 | Cel: +41 76 672 8927



RE: [pu-branch] Deposit submission upload

2014-03-18 Thread Pedro Miguel Paiva Gaudencio
Did you install the invenio-demosite?

Yep, you saw it running.

Yes, I think you should only use Tibor’s approach if you don’t need the recid 
afterwards. Querying for the last recid is an unreliable way to obtain it.

You can test it like this:

from invenio.modules.records.api import Record
r = Record.create({‘recid': 1234}, 'json')
r.produce('json_for_marc’)

This should give you something like this:

[{'005': '20140318071429.0'}, {'001': 1234}]

Yeah, so definitely something's wrong cause it returns me an empty list []...

Cheers,
Pedro


RE: [pu-branch] Deposit submission upload

2014-03-18 Thread Pedro Miguel Paiva Gaudencio
I got it working now, I used the guess_legacy_field_names() to check what was 
the correct field for the rule '001', and it worked with '_id' so I just added 
it based on the sip.metadata['recid'] while making the record (and before 
producing the marcxml).

A post-upload hook would take care of updating record ID - SIP
relationship in this approach.  Kind of like bst_send_email.py takes
care to email the original submitter of a document only after the record
is fully ingested.

I gave preference to the other approach, seemed less complex and with less post 
work to do.

Thank you for all the help and suggestions,
Pedro


RE: [pu-branch] Deposit submission upload

2014-03-18 Thread Pedro Miguel Paiva Gaudencio
I think it's supposed to exist some middle step in the workflow before the 
upload task where a human reviews and accepts/rejects the deposition, so this 
should be fine. Of course, some extra work could be computed to help run the 
process.

Cheers,
Pedro

From: Tibor Simko
Sent: Tuesday, March 18, 2014 5:33 PM
To: Pedro Miguel Paiva Gaudencio
Cc: Lars Holm Nielsen; Javier Martin Montull; project-invenio-devel (Invenio 
developers mailing-list)
Subject: Re: [pu-branch] Deposit submission upload

On Tue, 18 Mar 2014, Pedro Miguel Paiva Gaudencio wrote:
 I gave preference to the other approach, seemed less complex and with
 less post work to do.

The major drawback of this approach is vulnerability: if this submission
remains open to guests, as most existing INSPIRE submissions are, then
watch for script kiddies playing DoS games, or for some automated script
going mad, etc.

Best regards
--
Tibor Simko


RE: [pu-branch] Deposit submission upload

2014-03-18 Thread Pedro Miguel Paiva Gaudencio
A captcha is probably the best solution, I'll speak to Javier and let him know.

Cheers,
Pedro

From: Lars Holm Nielsen
Sent: Tuesday, March 18, 2014 9:20 PM
To: Tibor Simko
Cc: Pedro Miguel Paiva Gaudencio; Javier Martin Montull; project-invenio-devel 
(Invenio developers mailing-list)
Subject: Re: [pu-branch] Deposit submission upload

On 18 Mar 2014, at 17:33, Tibor Simko tibor.si...@cern.ch wrote:

 On Tue, 18 Mar 2014, Pedro Miguel Paiva Gaudencio wrote:
 I gave preference to the other approach, seemed less complex and with
 less post work to do.

 The major drawback of this approach is vulnerability: if this submission
 remains open to guests, as most existing INSPIRE submissions are, then
 watch for script kiddies playing DoS games, or for some automated script
 going mad, etc.

I think both approaches have the same vulnerability? In both cases a recid is 
generated no matter what - in one case synchronously, in the other case 
asynchronously. Properly a captcha would be a good idea for INSPIRE submissions?

Cheers,
Lars



 Best regards
 --
 Tibor Simko



RE: [pu-branch] Deposit submission upload

2014-03-17 Thread Pedro Miguel Paiva Gaudencio
Hi Lars,

Yes, perfectly! I got it working. What I'm doing is not creating the recid 
prior to the upload and let it generate itself wih bibupload -i.  The record is 
peacefully created, the thing I haven't thought was how to link the deposition 
with the record so it's edited afterwards...Is there a way to get the last 
recid generated? Perhaps adding the recid to the sip would solve it.

I also tried reserving the recid in the sip prior to the bibupload, but then 
when the marcxml is uploaded the preview points to the wrong record (the 
previous one, of course), because bibupload ran with -i since the recid wasn't 
present in the xml.

So, should I add the recid to the sip after the marcxml is uploaded or create 
an empty dummy record prior to the bibupload and then just update it in the 
bibupload?

Cheers,
Pedro


From: Lars Holm Nielsen
Sent: Monday, March 17, 2014 8:40 AM
To: Pedro Miguel Paiva Gaudencio
Cc: Javier Martin Montull; project-invenio-devel (Invenio developers 
mailing-list)
Subject: Re: [pu-branch] Deposit submission upload

Hi Pedro,

1) Generating recid prior to upload:
It all depends on the workflow and what you else you need to do. E.g. in Zenodo 
I need to know the recid prior to uploading, because I use the recid to 
generate a DOI which goes into the marcxml. Also, knowing the recid before 
bibupload runs, allows me to quickly generate a preview and record link to the 
soon to be uploaded record which I can display to the end-user right after they 
hit submit. In another workflow, it might be fine not to know the recid until 
after bibupload has been running.

2) JSONAlchemy: All the workflows should be moved to invenio-demosite, where 
you should have the recid 
(https://github.com/inveniosoftware/invenio-demosite/blob/pu/invenio_demosite/recordext/fields/atlantis.cfg#L698).
 It's WIP at the moment, and Esteban should soon have some changes coming in 
for JSONAlchemy.
I.e. you should install invenio-demosite on top of Invenio as well, and we 
should move the workflows out of Invenio to invenio-demosite.

Does that answers your questions?

Cheers,
Lars
On 14.03.2014 17:42, Pedro Miguel Paiva Gaudencio wrote:
Hi Lars,

I got the deposit submission upload thingy working, just some things left (I 
think/hope): the marcxml is generated without the 001 (record id - bibupload 
runs in -r mode in upload_record_sip() and fails because the recid was 
previously created) and 980 (collection information [article, book, 
preprint, report, etc] - which hides the record by default) fields.

I understood that the recid it's not supposed to be present in the new records' 
marcxml, but if I don't generate the recid (reserved_recid() and 
create_recid()) the workflow will fail when he gets to run_tasks().

I also understood (not quite sure if I'm right) that when we upload the new 
deposition, it will be generated a marcxml file from the json that the sip 
contained.

I checked the jsonalchemy.get_producer_rules() and it does not contain any rule 
for the 'recid', and so this is pobably why it's not being generated (from the 
json) along with the rest of the xml (on 
jsonalchemy.wrappers.legacy_export_as_marc()).

For the upload of new records to work peacefully we need to:

  *   add the 001 (adding rules for 'recid' in the producer rules?) and 980 
fields to the marcxml?
  *   add only the 980 field and always upload_record_sip() in -i mode?

Do we need the recid already reserved and created in the sips for the new 
records before the upload (since when a new record is inserted by bibupload a 
recid is created for that record)? If so, why?


This is my workflow (note that I'm only uploading new records and never editing 
existing submissions):

  1.  prefill_draft(draft_id='default'),
  2.  render_form(draft_id='default'),
  3.  prepare_sip(),
  4.  reserved_recid(),
  5.  create_recid(),
  6.  process_sip_metadata(process_recjson_new),
  7.  finalize_record_sip(),
  8.  upload_record_sip(),
  9.  run_tasks(update=False)

Sorry about the extensive reading.

Thanks in advance,
Pedro



--
Lars Holm Nielsen
CERN, IT Department, Collaboration  Information Services
http://zenodo.org | Tel: +41 22 76 79182 | Cel: +41 76 672 8927



[pu-branch] Deposit submission upload

2014-03-14 Thread Pedro Miguel Paiva Gaudencio
Hi Lars,

I got the deposit submission upload thingy working, just some things left (I 
think/hope): the marcxml is generated without the 001 (record id - bibupload 
runs in -r mode in upload_record_sip() and fails because the recid was 
previously created) and 980 (collection information [article, book, 
preprint, report, etc] - which hides the record by default) fields.

I understood that the recid it's not supposed to be present in the new records' 
marcxml, but if I don't generate the recid (reserved_recid() and 
create_recid()) the workflow will fail when he gets to run_tasks().

I also understood (not quite sure if I'm right) that when we upload the new 
deposition, it will be generated a marcxml file from the json that the sip 
contained.

I checked the jsonalchemy.get_producer_rules() and it does not contain any rule 
for the 'recid', and so this is pobably why it's not being generated (from the 
json) along with the rest of the xml (on 
jsonalchemy.wrappers.legacy_export_as_marc()).

For the upload of new records to work peacefully we need to:

  *   add the 001 (adding rules for 'recid' in the producer rules?) and 980 
fields to the marcxml?
  *   add only the 980 field and always upload_record_sip() in -i mode?

Do we need the recid already reserved and created in the sips for the new 
records before the upload (since when a new record is inserted by bibupload a 
recid is created for that record)? If so, why?


This is my workflow (note that I'm only uploading new records and never editing 
existing submissions):

  1.  prefill_draft(draft_id='default'),
  2.  render_form(draft_id='default'),
  3.  prepare_sip(),
  4.  reserved_recid(),
  5.  create_recid(),
  6.  process_sip_metadata(process_recjson_new),
  7.  finalize_record_sip(),
  8.  upload_record_sip(),
  9.  run_tasks(update=False)

Sorry about the extensive reading.

Thanks in advance,
Pedro


[pu-branch] Deposit submission taking too long

2014-03-13 Thread Pedro Miguel Paiva Gaudencio
Javier,

Not directly related to the feature I'm adding, but I think I found why the 
deposit submission is taking so long to be performed. I think that when the 
autocomplete query for sherpa romeo search is performed it takes about ~3/4mins 
to get thishttp://www.sherpa.ac.uk/romeo/api29.php?jtitle=sqtype=contains, 
and in the end nothing is being return. I don't know, it might be related...

I won't work on this now, I just though it would be nice to have some notes to 
help someone (if it isn't me) to fix it later.

Cheers,
Pedro


[pu branch] Installing different python version (2.7.5+)

2014-03-04 Thread Pedro Miguel Paiva Gaudencio
Hi all,

To all pu branch related developers: yesterday I set up the pu invenio 
environment and since some troubling issues weren't documented 
herehttps://docs.google.com/document/d/1PNjR67QMFeDQIDOIC_NgOrTrNcgtUIZEEPMd6crwyBs,
 I found a bash script and made some modifications to install everything 
quietly and without worries, so I think it would be interesting to share it 
along with newcomers to pu (or perhaps include it in the HOWTO document). 
Anyway, I uploaded it to my 
githubhttps://github.com/pedrogaudencio/invenio-pu-scripts, hope it's useful.

BTW, as Graham mentioned, shouldn't we have a mailing list or something just 
for pu branch (so that we don't bother everyone with pu related stuff)?

Cheers,
Pedro