On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ah...@gmail.com> wrote:
> > > On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke < > jeevan.cha...@enterprisedb.com> wrote: > >> >> >> On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.reh...@gmail.com> >> wrote: >> >>> >>> Attached are the updated patches. >>> >> >> I had a quick look over these changes and they look good overall. >> However, here are my few review comments I caught while glancing the >> patches >> 0002 and 0003. >> >> >> --- 0002 patch >> >> 1. >> Can lsn option be renamed to start-wal-location? This will be more clear >> too. >> >> 2. >> +typedef struct >> +{ >> + char name[MAXPGPATH]; >> + char type; >> + int32 size; >> + time_t mtime; >> +} BackupFile; >> >> I think it will be good if we keep this structure in a common place so >> that >> the client can also use it. >> >> 3. >> + SEND_FILE_LIST, >> + SEND_FILES_CONTENT, >> Can above two commands renamed to SEND_BACKUP_MANIFEST and >> SEND_BACKUP_FILE >> respectively? >> The reason behind the first name change is, we are not getting only file >> lists >> here instead we are getting a few more details with that too. And for >> others, >> it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST. >> >> 4. >> Typos: >> non-exlusive => non-exclusive >> retured => returned >> optionaly => optionally >> nessery => necessary >> totoal => total >> >> >> --- 0003 patch >> >> 1. >> +static int >> +simple_list_length(SimpleStringList *list) >> +{ >> + int len = 0; >> + SimpleStringListCell *cell; >> + >> + for (cell = list->head; cell; cell = cell->next, len++) >> + ; >> + >> + return len; >> +} >> >> I think it will be good if it goes to simple_list.c. That will help in >> other >> usages as well. >> >> 2. >> Please revert these unnecessary changes: >> >> @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult >> *res, int rownum) >> */ >> snprintf(filename, sizeof(filename), "%s/%s", current_path, >> copybuf); >> + >> if (filename[strlen(filename) - 1] == '/') >> { >> /* >> >> @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult >> *res, int rownum) >> * can map them too.) >> */ >> filename[strlen(filename) - 1] = '\0'; /* Remove >> trailing slash */ >> - >> mapped_tblspc_path = >> get_tablespace_mapping(©buf[157]); >> + >> if (symlink(mapped_tblspc_path, filename) != 0) >> { >> pg_log_error("could not create symbolic link >> from \"%s\" to \"%s\": %m", >> >> 3. >> Typos: >> retrive => retrieve >> takecare => take care >> tablespae => tablespace >> >> 4. >> ParallelBackupEnd() function does not do anything for parallelism. Will >> it be >> better to just rename it as EndBackup()? >> >> 5. >> To pass a tablespace path to the server in SEND_FILES_CONTENT, you are >> reusing >> a LABEL option, that seems odd. How about adding a new option for that? >> >> 6. >> It will be good if we have some comments explaining what the function is >> actually doing in its prologue. For functions like: >> GetBackupFilesList() >> ReceiveFiles() >> create_workers_and_fetch() >> >> >> Thanks >> >> >>> >>> Thanks, >>> >>> -- >>> Asif Rehman >>> Highgo Software (Canada/China/Pakistan) >>> URL : www.highgo.ca >>> >>> >> >> -- >> Jeevan Chalke >> Associate Database Architect & Team Lead, Product Development >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> > I had a detailed discussion with Robert Haas at PostgreConf Europe about > parallel backup. > We discussed the current state of the patch and what needs to be done to > get the patch committed. > > - The current patch uses a process to implement parallelism. There are many > reasons we need to use threads instead of processes. To start with, as > this is a client utility it makes > more sense to use threads. The data needs to be shared amongst different > threads and the main process, > handling that is simpler as compared to interprocess communication. > Yes I agree. I have already converted the code to use threads instead of processes. This avoids the overhead of interprocess communication. With a single file fetching strategy, this requires communication between competing threads/processes. To handle that in a multiprocess application, it requires IPC. The current approach of multiple threads instead of processes avoids this overhead. > - Fetching a single file or multiple files was also discussed. We > concluded in our discussion that we > need to benchmark to see if disk I/O is a bottleneck or not and if > parallel writing gives us > any benefit. This benchmark needs to be done on different hardware and > different > network to identify which are the real bottlenecks. In general, we agreed > that we could start with fetching > one file at a time but that will be revisited after the benchmarks are > done. > I'll share the updated patch in the next couple of days. After that, I'll work on benchmarking that in different environments that I have. > > - There is also an ongoing debate in this thread that we should have one > single tar file for all files or one > TAR file per thread. I really want to have a single tar file because the > main purpose of the TAR file is to > reduce the management of multiple files, but in case of one file per > thread, we end up with many tar > files. Therefore we need to have one master thread which is responsible > for writing on tar file and all > the other threads will receive the data from the network and stream to the > master thread. This also > supports the idea of using a thread-based model rather than a > process-based approach because it > requires too much data sharing between processes. If we cannot achieve > this, then we can disable the > TAR option for parallel backup in the first version. > I am in favour of disabling the tar format for the first version of parallel backup. > - In the case of data sharing, we need to try to avoid unnecessary locking > and more suitable algorithm to > solve the reader-writer problem is required. > > -- > Ibrar Ahmed > -- Asif Rehman Highgo Software (Canada/China/Pakistan) URL : www.highgo.ca