Resending to the mailing list too.

-------- Forwarded Message --------
Subject: Re: Proposal for High level sftp api for uploads and downloads
Date: Fri, 2 Jun 2023 10:54:56 +0200
From: Jakub Jelen <jje...@redhat.com>
To: Eshan Kelkar <eshankel...@galorithm.com>

Thank you for the great write-up!

I put some comments and thoughts inline. Let me know if some of the are unclear.

On 5/31/23 09:38, Eshan Kelkar wrote:
Libssh needs a high level api for uploading and
downloading files. Functions like sftp_put() and
sftp_get() need to be introduced which the users
can call simply to upload and download files instead
of having to write their own functions to perform uploads
and downloads using the low level read/write api's.

One more point why we do that is that the API needs to have reasonable throughput. With the current synchronous API, the performance is terrible and there was only low-level asynchronous API for downloads (hard to guess how much used though).

This mail suggests approaches that can be followed
to develop that api.
Some terminology :
1. local_fd - is the file descriptor [obtained using open()
or creat()] of a file on the local machine of the user.

2.  remote_file  - is a sftp file handle [obtained using
sftp_open()] of a file on the connected server.

3. concurrent_requests -  This is the number of requests
we initially issue before trying to get the responses.

Say you are downloading and you have 20 concurrent
requests, this means that first we have issued 20 read
requests, and then when we are trying to get/wait for the
response of the first request, the server may be processing
or may have processed the other 19 requests (This is the
advantage of asynchronous request-response pattern over
synchronous request-response where we would have issued
the next request [and server would have processed it] only
after getting a response of the first request).

And after getting the response of the first request, we issue
another request if needed and then try to get the responses
of the other outstanding requests and repeat this procedure
for them too.

Though you can see this name "concurrent_requests" is not
the best as the requests are not being issued in a concurrent
manner here, neither are they being processed concurrently by
the server. So if a better name comes to your mind, kindly suggest.

Reading through this, I would probably consider slightly better naming. These requests are in fact not concurrent, but work as you describe them. This is number of in-fligth requests, that are sequentially sent, but handling of the server responses is interleaved with other requests.

My proposal would be to call them in_flight_requests or interleaved_requests, but there might be better names. I would like to avoid the parallel/concurrent wording, which suggests there is some real threading/parallelism involved.

The OpenSSH's sftp manual page describes this as "num requests", which I also find quite unclear (and they default to 64):

Specify how many requests may be outstanding at any one time.

What you describe above is the current way how the asynchronous transfers are done. I assume for performance, the chunksize, number of in-fly requests and other parameters need to be tuned, but we should provide some reasonable default.

Other option is, that there might be completely different way to do that, which we should probably research/try before settling on particular solution.

4. chunk_size -  The number of bytes we usually issue a
read/write request for, if the number of bytes to read/write
is less than this chunk_size then the request is issued for
those many number of bytes and not chunk_size number
of bytes.

It is not completely clear to me what you try to describe in the second part of the paragraph. This is indeed one of the parameters that affect the throughput and resulting transfer speed/protocol overhead.

How the api can look from User's perspective :

Approach 1 :
----------------------------------------------
The user has to pass each of the 4 required things
for the transfer to the put, get functions.

For the default values of concurrent_requests and
chunk_size we can provide macros (which expand
to default values suggested by libssh) which the user
can use if he/she is not interested in setting some
custom values.

int sftp_put(int local_fd,
                    sftp_file remote_file,
                    int concurrent_requests,
                    size_t chunk_size);

int sftp_get(int local_fd,
                    sftp_file remote_file,
                    int concurrent_requests,
                    size_t chunk_size);

Approach 2 :
-------------------------------------
Store these 4 things in a structure, make user be able
to configure the number of concurrent_requests and
chunk_size if he/she wants to according to the requirements.

/* in sftp.h file */
typedef struct sftp_file_transfer_struct* sftp_file_transfer;

/* in sftp.c file */

struct sftp_file_transfer_struct
{
int local_fd;
sftp_file remote_file;
int concurrent_requests;
size_t  chunk_size;
};

I was thinking if we should somehow support remote-to-remote transfers. When copy-data extension is available, it should be trivial:

https://github.com/openssh/openssh-portable/blob/master/PROTOCOL#L581

But if not, we will probably have to copy data through local machine, which would slightly complicate things, but the process should not be that much different and avoiding need to store the whole file locally might be an advantage.

In that case, we would need a way to store two remote files and identify which is the "source" and which is the "target" one (probably with union to keep the structure it neat). This would also complicate the API below so I will go back to this below

sftp_file_transfer  sftp_file_transfer_new(int local_fd,
 sftp_file file)
{
1. Allocate a new structure of type
struct sftp_file_transfer_struct.

2. Assign its members the received local_fd and
remote file handle and set concurrent_requests and
chunk_size as the default ones recommended by libssh.

3. Return the address of that structure.
}

int sftp_file_transfer_options_set(sftp_file_transfer ft,
                                                      what_to_set,
                                                      void *ptr)
{
/* what_to_set is just a temporary name to denote
  * the parameter in which we'll receive info about
  * the field of the struct that the user wants to set,
  * e.g if SFTP_FILE_TRANSFER_OPTIONS_CHUNK_SIZE
  * is received in the parameter what_to_set, then chunk_size
  * is to be set.
  */
According to the value received in what_to_set,
typecast ptr and assign the appropriate member of the
structure that ft points to the value of the variable that
ptr points to.
}

I find this approach over-complicated. It gives great amount of flexibility we use for setting options for the ssh protocol as there are dozens of them but doing this with 2 options sounds too complex.

On the other hand, if we would keep the constructor without arguments and set also the source and target fds using this API, it would probably make sense as it gives us more flexibility if we would like to be able to tune some other parameters later if needed. So for now, I would go with

SFTP_FILE_TRANSFER_OPTIONS_CHUNK_SIZE
SFTP_FILE_TRANSFER_OPTIONS_IN_FLIGHT_REQESTS
SFTP_FILE_TRANSFER_OPTIONS_SOURCE_LOCAL
SFTP_FILE_TRANSFER_OPTIONS_SOURCE_REMOTE
SFTP_FILE_TRANSFER_OPTIONS_TARGET_LOCAL
SFTP_FILE_TRANSFER_OPTIONS_TARGET_REMOTE

(this also give the way to do local-to-local copy, which might be either excluded or handled transparently)

int sftp_file_transfer_put(sftp_file_transfer ft)
{
Upload the data of the file associated with ft->local_fd
and write it in the file associated with ft->remote_file.
}

int sftp_file_transfer_get(sftp_file_transfer ft)
{
Download the data of the file associated with ft->remote_file
and store in the file associated with ft->local_fd.
}

These two functions have the same signature so I would consider whether we should not mark the direction somehow in the transfer structure (either explicit or implicit by the options described above) and instead of these two have only one sftp_file_transfer(ft);

The code for sftp_get() and sftp_post() that will
be discussed further resembles a lot to the benchmark
code for async download and upload added as a commit
in this merge request. (See
https://gitlab.com/libssh/libssh-mirror/-/merge_requests/375 <https://gitlab.com/libssh/libssh-mirror/-/merge_requests/375>)

------------------------------------------------------
What happens inside the get (download)
------------------------------------------------------
1. Query the remote file size using sftp_stat().
This is the data size that we will wish to read.

2. Initially issue at max concurrent_requests number
of read requests to the server, while the requested number
of bytes (to read) are less than size to read. The issued
requests are to be stored in a request queue.

3. while (request queue is not empty)
{
3.1) Dequeue a request from the request queue and
get its response.

3.2.1) If the response is eof,  the file is smaller than expected.

3.2.2) If the response is not eof, but a short read before
reaching end of file, then its a short read (not yet decided what
to do in this case)

3.2.3) If it's neither of the above cases then the read was
successful, write the read data in the local file.

3.3) If the bytes requested to read < file_size, then issue
one more read request and add it to the request queue.

/* Since we'll make sure that the requested number of
  * bytes never exceed the queried file size, 3.2.1 should
  * never ideally occur, that is why if we get eof, its something
  * unexpected.
  */
}

/* Issuing read requests for get (download) */
-----------------------------------------------------------
sftp_aio_begin_read() can be called as it is here.
(This is a function introduced in the above linked
merge request and is used to issue a read request
for reading some number of bytes from a remote
file)

/* Getting response for the issued request for get (download) */
  
---------------------------------------------------------------------------------
First Iets analyse the steps involved when a user uses
the existing low level read api to get a response for a
previously issued read request and writes the received data
in a local file.

/* Step-1 to Step-3 are performed by the libssh api */
1. Get response from the server in msg (of type sftp_message),
the ssh_buffer containing the received read data is msg->payload.

2. Call ssh_buffer_get_ssh_string(msg->payload) which allocates
a new ssh_string, takes out the data from the buffer msg->payload
(i.e performs a memory copy) and stores it in the ssh_string and
returns it.

3. Copy the data from the ssh_string to the application buffer

/* Step-4 is performed by the libssh user */
4. Write data from the application buffer to the file.

In the case of sftp_get() all these 4 steps are to be performed
by the libssh api. We can optimize this process and avoid some
memory copies if we write directly from libssh buffer to the file using
the local file descriptor.

Something like :
write(fd,
          address where data to write is in ssh buffer,
          byte_count);

-----------------------------------------------------
What happens inside the post (upload)
-----------------------------------------------------
1) Use some function like stat() to get the local file
size, these are the total number of bytes we wish to
write.

2) Issue at max concurrent_requests number of
requests while the number of bytes requested to write
are less than the total file size. Add the requests in a
request queue.

3) while(request queue is not empty)
{
3.1) Dequeue a request from the request queue
and get its response.

3.2) The response can be a successful write or
a failure.

3.3) Issue one more write request if needed (i.e if
requested_bytes < total number of bytes) and add
it to the request queue.
}

/* Issuing write request for post (upload) */
------------------------------------------------------------
First let's analyse the steps involved in sending
a write request when the user is using the low level
write api to upload a local file.

/* Step-1 is performed by the user */
1. Read data from the local file to the application
buffer.

/* Step-2 to Step-4 are performed by the libssh api */
2. Copy the data from the application buffer to the libssh buffer.

3. Data of the libssh buffer is encrypted and written to
session->out_buffer

4. The data of session->out_buffer is written to the socket write
buffer.

In case of sftp_put() all these 4 steps need to be performed by the
libssh api while issuing each write request. We can optimize this
process and avoid one memory copy if we read directly from the file
to the libssh buffer, by-passing the application buffer. We will need to
do something like :

read(local_fd,
         address of location where data is to be read in libssh buffer,
         byte_count);

/* Getting response for a write request */
------------------------------------------------------
sftp_aio_wait_write() can be used here as it is.
(This is a function introduced in the above linked merge request,
and is used to get the response of a previously issued write
request)

Since the suggested optimisations need us to directly read from
local file to libssh buffer or write from libssh buffer to local file, I
think we will need to extend the buffer api in src/buffer.c to achieve this.

This is probably good idea to avoid copying buffers.

Another improvement that can be done is that instead
of using system calls read() and write() on local file we can use
the higher level fread(), fwrite() since they are buffered and using
them leads to lesser system calls in general. (which will improve
performance)

I think this yet assumption need to be confirmed. From fast google search, it should really not matter:

https://stackoverflow.com/questions/3002122/fastest-file-reading-in-c

It also mentions you might be able to get some more performance using mmap-ing the file into the memory. Or you can look for other ways to make the reads faster. But I think the network transfers speed or CPU used for encryption will always be the bottleneck so the local I/O speed is not that crucial.

As a side note, if you would like to test the I/O speeds, you can build libssh with `WITH_INSECURE_NONE` cmake option, which will allow you to negotiate `none` ciphers and communicate over plaintext (obviously not useful for real-world use!) and benchmark the I/O, buffer copying and more. Note, that this will require you to run the server that supports these mechanisms, so again ideally from libssh. You can use the libssh test server from the following merge request, I hope we will soon have merged:

https://gitlab.com/libssh/libssh-mirror/-/merge_requests/340

I am yet not sure that what should be done if we get short reads
from the remote side without reaching end of file, if any approach
comes to your mind kindly suggest. (According to the sftp protocol
https://datatracker.ietf.org/doc/html/draft-ietf-secsh-filexfer-02#page-13 
<https://datatracker.ietf.org/doc/html/draft-ietf-secsh-filexfer-02#page-13>
for reading, short reads before reaching end of file should not occur for
regular disk files. But they can occur for other kinds of files like device files,
but does using this put, get api make sense for device files or other kinds
of special files? If it doesn't I think returning an error is the way short reads
before reaching end of file should be dealt with)

As discussed yesterday, I think the SFTP should be used for normal files. The special files or devices are usually used for short reads and very unlikely to get transferred through SFTP (I think you would already have an issue with this while you would stat these files as they have size 0).

If the short reads occurs for normal files, I think the right thing is to fail and report the issue to the user. If the file was already partially transferred, it might make sense to provide some information how much was transferred and maybe provide a way to resume partial transfers, but I would probably keep this for later time. But it would be good to think about this possible extension where we would need to update the offset in the file structure with the size of the target file (or again check how openssh does this).

Kindly provide your thoughts on the proposed aspects (how the api
should look from the user's perspective, the working of the api, the
optimisations etc). Any suggestions to improve the proposed high level
put, get api are appreciated.

Thanks for the great job so far!

Regards,
--
Jakub Jelen
Crypto Team, Security Engineering
Red Hat, Inc.


Reply via email to